CA Classes-186-190

Computer Architecture Unit 8
CPU 0 CPU 1 CPU 2 CPU 3
Cache Cache Cache Cache
CPU1 CPU2 CPU3

CPU0
Memory Memory Memory
memory
HT Crosebar/Bus
Figure 8.8: NUMA Architecture
2. Shared main memory: In this memory system organisation, every

processor or core has its own private L1 and L2 caches, but all
processors share the common main memory. Although this was the
dominating architecture for small-scale multiprocessors, some of the
recent architectures abandoned the shared memory organisation and
switched to the NUMA organisation.
3. Shared L1 cache: This design is only used in chips with explicit
multithreading, where all logical processors share a single pipeline.
4. Shared L2 cache: This design minimises the on-chip data replication
and makes more efficient use of cache capacity. Some Chip-level
Multiprocessing (CMP) systems are built with shared L2 caches.
Self Assessment Questions
13. ILP stands for _______________________.
14. TLP is the abbreviation for _____________________.
8.8 Interleaved Memory Organisation

Interleaved Memory Organisation (or Memory Interleaving) is a technique
aimed at enhancing the efficiency of memory usages in a system where
more than one data/instruction is required to be fetched simultaneously by
the CPU as in the case of pipeline processors and vector processors. To
understand the concept, let us consider a system with a CPU and a memory
as shown in figure 8.9.
Manipal University of Jaipur B1648 Page No. 186

Data bus Address bus
MAR
CPU
MDR Memory
Figure 8.9: Interleaved Memory Organisation
As long as the processor requires a single memory read at a time, the

above memory arrangement with a single MAR, a single MDR, a single
Address bus and a single Data bus is sufficient. However, if more than one
read is required simultaneously, the arrangement fails. This problem can be
overcome by adding as many address and data bus pairs along with
respective MARs and MDRs. But buses are expensive as equal number of
bus controllers will be required to carry out the simultaneous reads.
An alternative technique to handle simultaneous reads with comparatively
low cost overheads is memory interleaving. Under this scheme, the memory
is divided into numerous modules which is equivalent to the number of
simultaneous reads required, having their own sets of MARs and MDR but
sharing common data and address buses. For example, if an instruction
pipeline processor requires two simultaneous reads at a time, the memory is
partitioned into two modules having two MARs and two MDRs, as shown in
figure 8.10.

Data bus Address bus
MAR 1
MDR 1 Memory module 1
CPU
MAR 2
Memory module 2
MDR 2
Figure 8.10: MAR and MDR in Interleaved Memory Modules
The memory modules are assigned different mutually exclusive memory

address spaces. Thus, in this case suppose, the memory module 1 is
assigned even address space and memory module 2 is assigned odd
memory space. Now, when the CPU needs two instructions to be fetched
from the memory; let us say located at address 2 and 3, the first memory
module is loaded with the address 2. While the first instruction is being read
into the MDR1, the MAR2 is loaded with address 3.
When both the instructions are ready to be read into the CPU from the
respective MDRs, the CPU reads them one after the other from these two
registers. This is an example of two-way interleaved memory architecture. In
a similar way an n-way interleave memory may be designed. Obviously, but
for this technique, 2 sets of address and data buses and MARs and MDRs
would be required to achieve the same objective.
This type of modular memory architecture is helpful for systems that use
vector or pipeline processing. By suitably arranging the memory accesses,
the memory cycle time will reduced to a number of memory module. The
same technique is also employed in enhancing the speed of read/write
operations in various secondary storage devices such as hard disks and the
like.


15. __________________ is a technique aimed at enhancing the
efficiency of memory usages in a system
16. __________________ share common data and address buses.
8.9 Bandwidth and Fault Tolerance

H. Hellerman (1967) has derived an equation to estimate the effective
increase in memory bandwidth through multiway interleaving. A single
memory module is assumed to deliver one word per memory cycle and
thus, has a bandwidth of 1.
Memory Bandwidth: The memory bandwidth B of an m-way interleaved
memory is upper-bounded by m and lower-bounded by 1. Hellerman
estimated B as:
In this equation m= number of interleaved memory modules. This equation

implies that if 16 memory modules are used, then the effective memory
bandwidth is approximately four times that of a single module. This
pessimistic estimate is due to the fact that block access of various lengths
and access of single words are randomly mixed in user programs.
Hellerman’s estimate was based on a single-processor system. If memory-
access conflicts from multiple-processors, such as the hot spot problems,
are considered, the effective memory bandwidth will be further reduced.
In a vector processing, the access time of a long vector with n elements and
stride distance 1 has been estimated by Cragon (1992) as: It is assumed
that the n elements are stored in contiguous memory locations in m-way
interleaved memory system. The average time t1 required to access one
element in a vector is estimated by
 m 1
t1  (1  )
m n

Where, n   (very long vector), t1   r .As n  1 (scalar access),
m
t1  
.

Fault Tolerance: High- and low-order interleaving could be mixed to

generate various interleaved memory organisations. Sequential addresses
are assigned in the high-order interleaved memory in each memory module.
This makes it easier to isolate faulty memory modules in a memory bank of
m memory modules. When one module failure is detected, the remaining
modules can still be used by opening a window in the address space. This
fault isolation cannot be carried out in a low-order interleaved memory, in
which a module failure may paralyse the entire memory bank. Thus, low-
order interleaving memory is not fault-tolerant.
17. The memory bandwidth is upper-bounded by ____________ and
lower-bounded by __________________.
18. __________________ are assigned in the high-order interleaved
memory in each memory module.
8.10 Consistency Models

Usually the logical data store is distributed and replicated physically
throughout several processes. But the consistency models acts as a
agreement among the data storage and processes. Perfection in the work of
store only happens if the processes follow some rules. This model helps in
understanding that how simultaneous writes and reads occur in shared
memory. It is applicable for shared memory multiprocessors with shared
databases and cache coherence algorithms.
Consistency Models are divided into two models: Strong and Weak.
8.10.1 Strong consistency models
In these models, the operations on shared data are synchronised. The
various strong consistency models are:
i) Strict consistency: As the name suggests it is very strict. In this type
of consistency if there is any read on data item then it will gives the
matching result of the last written date item. The main drawback of this
consistency is that it depends on the absolute global time.
ii) Sequential consistency: In this type of consistency, if the processes
are executed in sequence then the results is similar to the read write
operations. The operations of all processes are in sequence as defined
in the program. Figure 8.11 shows the Sequential Consistency Model

CA Classes-186-190

Uploaded by

Copyright:

Available Formats

You might also like

CA Classes-186-190

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CA Classes-186-190

Uploaded by

Copyright:

Available Formats

Computer Architecture Unit 8

CPU 0 CPU 1 CPU 2 CPU 3

Cache Cache Cache Cache

CPU1 CPU2 CPU3

Figure 8.8: NUMA Architecture

2. Shared main memory: In this memory system organisation, every

8.8 Interleaved Memory Organisation

Manipal University of Jaipur B1648 Page No. 186

Data bus Address bus

Figure 8.9: Interleaved Memory Organisation

As long as the processor requires a single memory read at a time, the

Manipal University of Jaipur B1648 Page No. 187

Data bus Address bus

Figure 8.10: MAR and MDR in Interleaved Memory Modules

The memory modules are assigned different mutually exclusive memory

Manipal University of Jaipur B1648 Page No. 188

Self Assessment Questions

8.9 Bandwidth and Fault Tolerance

In this equation m= number of interleaved memory modules. This equation

Manipal University of Jaipur B1648 Page No. 189

Fault Tolerance: High- and low-order interleaving could be mixed to

8.10 Consistency Models

Manipal University of Jaipur B1648 Page No. 190

You might also like