Differentiate Tightly Coupled and Loosely Coupled System

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

MULTIPROCESSORS

1. Differentiate Tightly coupled and Loosely coupled system.


Tightly Coupled System Loosely Coupled System
Tasks and/or processors communicate in a Tasks or processors do not communicate in a
highly synchronized fashion. synchronized fashion.
Communicates through a common shared Communicates by message passing packets.
memory.
Shared memory system. Distributed memory system.
Overhead for data exchange is lower Overhead for data exchange is higher
comparatively. comparatively.

2. Explain Time Shared Common bus Interconnection Structures.


• A common bus multiprocessor system consists of a number of processors connected through
a common path to a memory unit.
• A time shared common bus for five processors is shown in figure 10.1.
• Only one processor can communicate with the memory or another processor at any given
time.
• Transfer operations are conducted by the processor that is in control of the bus at the time.

Figure 10.1: Time-shared common bus organization

• Any other processor wishing to initiate a transfer must first determine the availability status
of the bus and only after the bus becomes available, the processor address the destination
unit to initiate the transfer.
• A command is issued to inform the destination unit, what operation is to be performed.
• The receiving unit recognizes its address in the bus and responds to the control signals from
the sender, after which the transfer is initiated.
• A single common-bus system is restricted to one transfer at a time.
• This means that when one processor is communicating with the memory, all other
processors are either busy with internal operations or must be idle waiting for the bus.
• As a consequence, the total overall transfer rate within the system is limited by the speed of
the single path.
• A more economical implementation of a dual bus structure is depicted in figure 10.2.
MULTIPROCESSORS

• Here we have a number of local buses each connected to its own local memory and to one
or more processors.
• Each local bus may be connected to a CPU, an IOP, or any combination of processors.
• A system bus controller links each local bus to a common system bus.
• The I/O devices connected to the local IOP, as well as the local memory, are available to the
local processor.

Figure 10.2: System bus structure for multiprocessors

• The memory connected to the common system bus is shared by all processors.
• If an IOP is connected directly to the system bus, the I/O devices attached to it may be made
available to all processors.
• Only one processor can communicate shared memory with the shared memory and other
common resources through the system bus at any given time.
• The other processors are kept busy communicating with their local memory and I/O devices.

2. Explain Multiport Memory Interconnection Structures.


• A multiport memory system employs separate buses between each memory module and
each CPU.
• This is shown in figure 10.3 for four CPUs and four memory modules (MMs).
• Each processor bus is connected to each memory module.
• A processor bus consists of the address, data, and control lines required to communicate
with memory.
• The memory module is said to have four ports and each port accommodates one of the
buses.
MULTIPROCESSORS

• The module must have internal control logic to determine which port will have access to
memory at any given time.
• Memory access conflicts are resolved by assigning fixed priorities to each memory port.

Figure 10.3: Multiport memory organization


• The priority for memory access associated with each processor may be established by the
physical port position that its bus occupies in each module.
• Thus CPU 1 will have priority over CPU 2, CPU 2 will have priority over CPU 3, and CPU 4 will
have the lowest priority.
• The advantage of the multiport memory organization is the high transfer rate that can be
achieved because of the multiple paths between processors and memory.
• The disadvantage is that it requires expensive memory control logic and a large number of
cables and connectors.

3. Explain Crossbar Switch Interconnection Structures.


• Figure 10.4 shows a crossbar switch interconnection between four CPUs and four memory
modules.
• The small square in each cross point is a switch that determines the path from a processor
to a memory module.
• Each switch point has control logic to set up the transfer path between a processor and
memory.
• It examines the address that is placed in the bus to determine whether its particular module
is being addressed.
• It also resolves multiple requests for access to the same memory module on a predetermined
priority basis, figure 10.5 shows the functional design of a crossbar switch connected to one
memory module.
• The circuit consists of multiplexers that select the data, address, and control from one CPU
for communication with the memory module.
MULTIPROCESSORS

• Priority levels are established by the arbitration logic to select one CPU when two or more
CPUs attempt to access the same memory.

Figure 10.4: Crossbar switch

Figure 10.5: Block diagram of crossbar switch

• The multiplex are controlled with the binary code that is generated by a priority encoder
with in the arbitration logic.
• A crossbar switch organization supports simultaneous transfers from memory modules
because there is a separate path associated with each module.
• However, the hardware required to implement the switch can becomes quite large and
complex.
MULTIPROCESSORS

4. Explain Multistage Switching Network Interconnection Structures.


• The basic component of a multistage network is a two-input, two-on interchange switch
interchange switch.

Figure 10.6: Operation of a 2 x 2 interchange switch


• As shown in figure 10.6 the 2 X 2 switch has two input labeled A and B, and two outputs,
labeled 0 and 1.
• There are control sign (not shown) associated with the switch that establish the
interconnection between the input and output terminals.
• The switch has the capability connecting input A to either of the outputs. Terminal B of the
switch behaves in a similar fashion.
• The switch also has the capability to arbitrate between conflicting requests.
• If inputs A and B both request the same output terminal only one of them will be connected;
the other will be blocked.
• Using the 2 X 2 switch as a building block, it is possible to build multistage network to control
the communication between a number of source and destinations.
• To see how this is done, consider the binary tree shown figure 10.7.

Figure 10.7: Binary tree with 2 x 2 switches

• The two processors P1 and P2 are connected through switches to eight memory modules
marked in binary from 000 through 111.
MULTIPROCESSORS

• The path from source to a destination is determined from the binary bits of the destination
number.
• The first bit of the destination number determines the switch output in the first level.
• The second bit specifies the output of the switch in the third level.
• Example to connect P1 to memory 101, it is necessary to form a path from P1 to output 1 in
first level switch, output 0 in second level switch and output 1 in the third level switch.

5. Explain Hypercube Interconnection Structures.


• The hypercube or binary n cube multiprocessor structure is loosely coupled system
composed of N=2n processors, interconnected in n-dimensional binary cube.
• Each processor forms a node of the cube.
• Each processor has direct communication paths to other neighbor processors.
• These paths correspond to the edges of the cube.
• There are 2n distinct n-bit binary addresses that can be assigned to the processors.
• Each processor address differs from that of each of its n neighbors by exactly one bit position.
• Figure 10.8 shows the hypercube structure for n = 1, 2, and 3.
• A one-cube structure has n = 1 and 2n = 2. It contains two processors interconnected by a
single path.
• A two-cube structure has 21 = 2 and 2n = 4. It contains four nodes interconnected as a square.

Figure 10.8: Hypercube structures for n = 1,2,3


• A three-cube structure has eight nodes interconnected as a cube.
• An n-cube structure has 2n nodes with a processor residing in each node.
• Each node is assigned a binary address in such a way that the addresses of two neighbors
differ in exactly one bit position.
• For example, the three neighbors of the node with address 100 in a three-cube structure are
000,110, and 101.
• Each of these binary numbers differs from address 100 by one bit value.
• For example, in a three-cube structure, node 000 can communicate directly with node 001.
• It must cross at least two links to communicate with 011 (from 000 to 001 to 011 or from
000 to 010 to 011).
• It is necessary to go through at least three links to communicate from node 000 to node 111.
MULTIPROCESSORS

• A routing procedure can be developed by computing the exclusive-OR of the source node
address with the destination node address.
• For example, in a three-cube structure, a message at 010 going to 001 produces an exclusive-
OR of the two addresses equal to 011.
• The message can be sent along the second axis to 000 and then through the third axis to 001.

6. Explain Daisy chain (Serial) arbitration.


• Arbitration procedures service all processor requests on the basis of established priorities.
• The serial priority resolving technique is obtained from a daisy-chain connection of bus
arbitration circuits.
• The processors connected to the system bus are assigned priority according to their position
along the priority control line.
• The device closest to the priority line is assigned the highest priority.
• When multiple devices concurrently request the use of the bus, the device with the highest
priority is granted access to it.

Figure 10.9: Serial (daisy-chain) arbitration

• Figure 10.9 shows the daisy chaining connection of four arbiters.


• It is assumed that each processor has its own bus arbiter logic with priority.
• The priority out (PO) of each arbiter is connected to the priority in (PI) of the next lower
priority arbiter.
• The PI of the highest-priority unit is maintained at logic 1 value.
• The highest-priority unit in the system will always receive access to the system bus when it
requests it.
• The PO output for a particular arbiter is equal to 1 if its PI input is equal to 1 and the processor
associated with the arbiter logic is not requesting control of the bus.
• This is the way that priority is passed to the next unit in the chain.
• When processor requests control of the bus and the corresponding arbiter finds its Pl input
equal to 1, it sets its PO output to 0.
• Lower-priority arbiters receive a 0 in PI and generate a 0 in PO.
• Thus the processor whose arbiter has a PI = 1 and PO = 0 is the one that is given control of
the system bus.
• The busy line comes from open-collector circuits in each unit and provides a wired-OR logic
connection.
• If the line is inactive, it means that no other processor is using the bus.
MULTIPROCESSORS

7. Explain Parallel Arbitration Logic.


• The parallel bus arbitration technique uses an external priority encoder and a decoder as
shown in figure 10.10.
• Each bus arbiter in the parallel scheme has a bus request output line and a bus acknowledge
input line.
• Each arbiter enables the request line when its processor is requesting access to the system
bus.

Figure 10.10: Parallel arbitration

• The processor takes control of the bus if it acknowledge input line is enabled.
• The bus busy line provides an orderly transfer of control, as in the Daisy chaining case.
• Figure 10.10 shows the request lines from four arbiters going into a 4 X 2 priority encoder.
• The output of the encoder generates a 2-bit code which represents the highest-priority unit
among those requesting the bus.
• The bus priority-in (BPRN) and bus priority-out (BPRO) are used for a daisy-chain connection
of bus arbitration circuits.
• The bus busy signal BUSY is an open-collector output used to instruct all arbiters when the
bus is busy conducting a transfer.
• The common bus request (CBRQ) is also an open-collector output that serves to instruct the
arbiter if there are any other arbiters of lower-priority requesting use of the system bus.
• The signals used to construct a parallel arbitration procedure are bus request (BREQ) and
priority-in (BPRN), corresponding to the request and acknowledgement signals in figure
10.10.
• The bus clock (BCLK) is used to synchronize all bus transactions.
MULTIPROCESSORS

8. Explain Dynamic Arbitration Algorithms


• A dynamic priority algorithm gives the system the capability for changing the priority of the
devices while the system is in operation.
Time slice
• The time slice algorithm allocates a fixed-length time slice of bus time that is offered
sequentially to each processor, in round-robin fashion.
• The service is location independent.
• No preference is given to any particular device since each is allotted the same amount of
time to communicate with the bus.

Polling
• In a bus system that uses polling, the bus grant signal is replaced by a set of lines called poll
lines which are connected to all units.
• These lines are used by the bus controller to define an address for each device connected to
the bus.
• The bus controller sequences through the addresses in a prescribed manner.
• When a processor that requires access recognizes its address, it activates the bus busy line
and then accesses the bus.
• After a number of bus cycle, the polling process continues by choosing a different processor.
• The polling sequence is normally programmable, and as a result, the selection priority can
be altered under program control.

LRU
• The least recently used (LRU) algorithm gives the highest priority to the requesting device
that has not used the bus for the longest interval.
• The priorities are adjusted after a number of bus cycles according to the LRU algorithm.
• With this procedure, no processor is favored over any other since the priorities are
dynamically changed to give every device an opportunity to access the bus.

FIFO
• In the first-come first-serve scheme, requests are served in the order received.
• To implement this algorithm the bus controller establishes a queue arranged according to
the time that the bus requests arrive.
• Each processor must wait for its turn to use the bus on a first-in first-out (FIFO) basis.

Rotating daisy-chain
• The rotating daisy-chain procedure is a dynamic extension of the daisy chain algorithm.
• Highest priority to the unit that is nearest to the unit that has most recently accessed the bus
(it becomes the bus controller).
MULTIPROCESSORS

9. Describe cache coherence problem and its solutions in detail.


Cache coherence problem
• To ensure the ability of the system to execute memory operations correctly, the multiple
copies must be kept identical.
• This requirement imposes a cache coherence problem.
• A memory scheme is coherent if the value returned on a load instruction is always the value
given by the latest store instruction with the same address.
• Cache coherence problems exist in multiprocessors with private caches because of the need
to share writable data.
• Read-only data can safely be replicated without cache coherence enforcement mechanisms.
To illustrate the problem, consider the three-processor configuration with private caches
shown in figure 10.11.
• During the operation an element X from main memory is loaded into the three processors,
P1, P2, and P3.
• It is also copied into the private caches of the three processors.
• For simplicity, we assume that X= 52.
• The load on X to the three processors results in consistent copies in the caches and main
memory.
• If one of the processors performs a store to X, the copies of X in the caches become
inconsistent.
• A load by the other processors will not return the latest value.

Write-through policy
• As shown in figure 10.12, a store to X (of the value of 120) into the cache of processor P1
updates memory to the new value in a write-through policy.
• A write-through policy maintains consistency between memory and the originating cache,
but the other two caches are inconsistent since they still hold the old value.

Write-back policy
• In a write-back policy, main memory is not updated at the time of the store.
• The copies in the other two caches and main memory are inconsistent.
• Memory is updated eventually when the modified data in the cache are copied back into
memory.
• Another configuration that may cause consistency problems is a direct memory access (DMA)
activity in conjunction with an IOP connected to the system bus.
• In the case of input, the DMA may modify locations in main memory that also reside in cache
without updating the cache.
• During a DMA output, memory locations may be read before they are updated from the
cache when using a write-back policy.
MULTIPROCESSORS

Figure 10.11: Cache configuration after a load on X.

Figure 10.12: Cache configuration after a store to X by processor P1


Solution of cache coherence problem
Various schemes have been proposed to solve the cache coherence problem in shared memory
multiprocessors.
Disallow private caches
• A simple scheme is to disallow private caches for each processor and have a shared cache
memory associated with main memory.
• Every data access is made to the shared cache.
• This method violates the principle of closeness of CPU to cache and increases the average
memory access time.
• In effect, this scheme solves the problem by avoiding it.
MULTIPROCESSORS

Software Approaches
Read-Only Data are Cacheable
• The scheme that allows only nonshared and read-only data to be stored in caches. Such items
are called cachable.
• Shared writable data are noncachable.
• The compiler must tag data as either cachable or noncachable, and the system hardware
makes sure that only cachable data are stored in caches.
• The noncachable data remain in main memory.
• This method restricts the type of data stored in caches and introduces an extra software
overhead that may degrades performance.
Centralized Global Table
• A scheme that allows writable data to exist in at least one cache is a method that employs a
centralized global table in it compiler.
• The status of memory blocks is stored in the central global table.
• Each block is identified as read-only (RO) or read and write (RW).
• All caches can have copies of blocks identified as RO.
• Only one cache can have a copy of an RW block.
• Thus if the data are updated in the cache with an RW block, the other caches are not affected
because they do not have a copy of this block.
Hardware Approaches
Hardware-only solutions are handled by the hardware automatically and have the advantage of
higher speed and program transparency.
Snoopy Cache Controller
• In the hardware solution, the cache controller is specially designed to allow it to monitor all
bus requests from CPUs and IOPs.
• All caches attached to the bus constantly monitor the network for possible write operations.
• Depending on the method used, they must then either update or invalidate their own cache
copies when a match is detected.
• The bus controller that monitors this action is referred to as a snoopy cache controller.
• This is basically a hardware unit designed to maintain a bus-watching mechanism over all the
caches attached to the bus.
• All the snoopy controllers watch the bus for memory store operations.
• When a word in a cache is updated by writing into it, the corresponding location in main
memory is also updated.
• The local snoopy controllers in all other caches check their memory to determine if they have
a copy of the word that has been overwritten.
• If a copy exists in a remote cache, that location is marked invalid.
• Because all caches snoop on all bus writes, whenever a word is written, the net effect is to
update it in the original cache and main memory and remove it from all other caches.
• If at some future time a processor accesses the invalid item from its cache, the response is
equivalent to a cache miss, and the updated item is transferred from main memory. In this
way, inconsistent versions are prevented.

You might also like