Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

ACA Assignment 4

Model Paper

UNIT – 1

a. Define clock rate and CPI.


Answer - Clock rate means the number of pulses generated by CPU in one
second. It is generally measured in MHz (Megahertz) or GHz (Gigahertz).
Today's computers generally run at a clock rate having more than one
Gigahertz. Generally, clock speed or clock rate is determined by a quartz-
crystal circuit.
CPI - In computer architecture, cycles per instruction (aka clock cycles
per instruction, clocks per instruction, or CPI) is one aspect of
a processor's performance: the average number of clock
cycles per instruction for a program or program fragment. It is
the multiplicative inverse of instructions per cycle.
b. Define the term multiprocessor and multicomputer?
Answer - Multiprocessor - A Multiprocessor is a computer system with two
or more central processing units (CPUs) share full access to a common
RAM. The main objective of using a multiprocessor is to boost the
system’s execution speed, with other objectives being fault tolerance
and application matching.
There are two types of multiprocessors, one is called shared memory
multiprocessor and another is distributed memory multiprocessor. In
shared memory multiprocessors, all the CPUs shares the common
memory but in a distributed memory multiprocessor, every CPU has its
own private memory.
Multicomputer - A multicomputer system is a computer system with
multiple processors that are connected together to solve a problem.
Each processor has its own memory and it is accessible by that particular
processor and those processors can communicate with each other via an
interconnection network.
As the multicomputer is capable of messages passing between the
processors, it is possible to divide the task between the processors to
complete the task. Hence, a multicomputer can be used for distributed
computing. It is cost effective and easier to build a multicomputer than a
multiprocessor.
Fig: Multiprocessor Fig: Multicomputer
c. Explain the organization of multiprocessor
Answer – Parallel computing is a computing where the jobs are broken into discrete
parts that can be executed concurrently. Each part is further broken down to a series
of instructions. Instructions from each part execute simultaneously on different
CPUs. Parallel systems deal with the simultaneous use of multiple computer
resources that can include a single computer with multiple processors, a number of
computers connected by a network to form a parallel processing cluster or a
combination of both.
Parallel systems are more difficult to program than computers with a single
processor because the architecture of parallel computers varies accordingly and the
processes of multiple CPUs must be coordinated and synchronized.
1. Single-instruction, single-data (SISD) systems –
An SISD computing system is a uniprocessor machine which is capable of
executing a single instruction, operating on a single data stream. In SISD,
machine instructions are processed in a sequential manner and computers
adopting this model are popularly called sequential computers. Most
conventional computers have SISD architecture. All the instructions and data to
be processed have to be stored in primary memory.
2. Single-instruction, multiple-data (SIMD) systems –
An SIMD system is a multiprocessor machine capable of executing the same
instruction on all the CPUs but operating on different data streams. Machines
based on an SIMD model are well suited to scientific computing since they
involve lots of vector and matrix operations. So that the information can be
passed to all the processing elements (PEs) organized data elements of vectors
can be divided into multiple sets(N-sets for N PE systems) and each PE can
process one data set.
3. Multiple-instruction, single-data (MISD) systems –
An MISD computing system is a multiprocessor machine capable of executing
different instructions on different PEs but all of them operating on the same
dataset.
Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines built
using the MISD model are not useful in most of the application, a few machines
are built, but none of them are available commercially.
4. Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets. Each PE in the MIMD model has
separate instruction and data streams; therefore machines built using this model
are capable to any kind of application. Unlike SIMD and MISD machines, PEs in
MIMD machines work asynchronously.
d. Explain interconnection network architecture. Comparing their architecture feature
Answer - An interconnection network in a parallel machine transfers information from
any source node to any desired destination node. This task should be completed with
as small latency as possible. It should allow a large number of such transfers to take
place concurrently. Moreover, it should be inexpensive as compared to the cost of
the rest of the machine.
The network is composed of links and switches, which helps to send the information
from the source node to the destination node. A network is specified by its topology,
routing algorithm, switching strategy, and flow control mechanism.

Organizational Structure

Interconnection networks are composed of following three basic components −


 Links − A link is a cable of one or more optical fibres or electrical wires with a
connector at each end attached to a switch or network interface port. Through
this, an analog signal is transmitted from one end, received at the other to obtain
the original digital information stream.
 Switches − A switch is composed of a set of input and output ports, an internal
“cross-bar” connecting all input to all output, internal buffering, and control logic
to effect the input-output connection at each point in time. Generally, the
number of input ports is equal to the number of output ports.
 Network Interfaces − The network interface behaves quite differently than switch
nodes and may be connected via special links. The network interface formats the
packets and constructs the routing and control information. It may have input
and output buffering, compared to a switch. It may perform end-to-end error
checking and flow control. Hence, its cost is influenced by its processing
complexity, storage capacity, and number of ports.

Interconnection Network

Interconnection networks are composed of switching elements. Topology is the


pattern to connect the individual switches to other elements, like processors,
memories and other switches. A network allows exchange of data between
processors in the parallel system.
 Direct connection networks − Direct networks have point-to-point connections
between neighbouring nodes. These networks are static, which means that the
point-to-point connections are fixed. Some examples of direct networks are rings,
meshes and cubes.
 Indirect connection networks − Indirect networks have no fixed neighbours. The
communication topology can be changed dynamically based on the application
demands. Indirect networks can be subdivided into three parts: bus networks,
multistage networks and crossbar switches.

UNIT – 2

a. Define the term cache coherence problem.

Answer - Cache coherence is a concern raised in a multi-core system distributed L1 and


L2 caches. Each core has its own L1 and L2 caches and they need to always be in-sync
with each other to have the most up-to-date version of the data.

The Cache Coherence Problem is the challenge of keeping multiple local caches
synchronized when one of the processors updates its local copy of data which is
shared among multiple caches.

Imagine a scenario where multiple copies of same data exists in different caches
simultaneously, and if the processors are allowed to update their own copies freely,
an inconsistent view of memory can result.

b. Define interleaved memory organization.

Answer - It is a technique for compensating the relatively slow speed of DRAM(Dynamic


RAM). In this technique, the main memory is divided into memory banks which can
be accessed individually without any dependency on the other.
For example: If we have 4 memory banks(4-way Interleaved memory), with each
containing 256 bytes, then, the Block Oriented scheme(no interleaving), will assign
virtual address 0 to 255 to the first bank, 256 to 511 to the second bank. But in
Interleaved memory, virtual address 0 will be with the first bank, 1 with the second
memory bank, 2 with the third bank and 3 with the fourth, and then 4 with the first
memory bank again.
Hence, CPU can access alternate sections immediately without waiting for memory
to be cached. There are multiple memory banks which take turns for supply of data.
Memory interleaving is a technique for increasing memory speed. It is a process that
makes the system more efficient, fast and reliable.
Types of Interleaving
There are two methods for interleaving a memory:
1) 2-Way Interleaved - Two memory blocks are accessed at same time for writing and
reading operations.

2) 4-Way Interleaved - Four memory blocks are accessed at the same time.

c. Briefly explain RIS and CISC scalar processors.

Answer - RISC Processor -


It is known as Reduced Instruction Set Computer. It is a type of microprocessor that
has a limited number of instructions. They can execute their instructions very fast
because instructions are very small and simple.
RISC chips require fewer transistors which make them cheaper to design and
produce. In RISC, the instruction set contains simple and basic instructions from
which more complex instruction can be produced. Most instructions complete in one
cycle, which allows the processor to handle many instructions at same time.
In this instructions are register based and data transfer takes place from register to
register.

CISC Processor
 It is known as Complex Instruction Set Computer.

 It was first developed by Intel.


 It contains large number of complex instructions.
 In this instructions are not register based.
 Instructions cannot be completed in one machine cycle.
 Data transfer is from memory to memory.
 Micro programmed control unit is found in CISC.
 Also they have variable instruction formats.

d. Discuss memory hierarchy technology in computer system.

Answer - Memory Organization in Computer Architecture


A memory unit is the collection of storage units or devices together.
The memory unit stores the binary information in the form of bits.
Generally, memory/storage is classified into 2 categories:

 Volatile Memory: This loses its data, when power is switched off.
 Non-Volatile Memory: This is a permanent storage and does not lose
any data when power is switched off.

The total memory capacity of a computer can be visualized by hierarchy


of components. The memory hierarchy system consists of all storage
devices contained in a computer system from the slow Auxiliary Memory
to fast Main Memory and to smaller Cache memory.
Auxiliary memory access time is generally 1000 times that of the main
memory, hence it is at the bottom of the hierarchy.
The main memory occupies the central position because it is equipped
to communicate directly with the CPU and with auxiliary memory
devices through Input/output processor (I/O).
When the program not residing in main memory is needed by the CPU,
they are brought in from auxiliary memory. Programs not currently
needed in main memory are transferred into auxiliary memory to
provide space in main memory for other programs that are currently in
use.
The cache memory is used to store program data which is currently
being executed in the CPU. Approximate access time ratio between
cache memory and main memory is about 1 to 7~10

UNIT – 3

a. What is pipelining.
Answer- Pipelining is the process of accumulating instruction from the processor through a
pipeline. It allows storing and executing instructions in an orderly process. It is also
known as pipeline processing.
Pipelining is a technique where multiple instructions are overlapped during
execution. Pipeline is divided into stages and these stages are connected with one
another to form a pipe like structure. Instructions enter from one end and exit from
another end.
Pipelining increases the overall instruction throughput.
In pipeline system, each segment consists of an input register followed by a
combinational circuit. The register is used to hold data and combinational circuit
performs operations on it. The output of combinational circuit is applied to the input
register of the next segment.

b. What is the difference b/w arithmetic & instructor pipelining.

Answer - Arithmetic Pipeline


Arithmetic pipelines are usually found in most of the computers. They are
used for floating point operations, multiplication of fixed point numbers etc. For
example: The input to the Floating Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers),
while a and b are exponents.
The floating point addition and subtraction is done in 4 parts:

1. Compare the exponents.


2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.

Registers are used for storing the intermediate results between the above
operations.

Instruction Pipeline
In this a stream of instructions can be executed by
overlapping fetch, decode and execute phases of an instruction cycle. This type of
technique is used to increase the throughput of the computer system.
An instruction pipeline reads instruction from the memory while previous
instructions are being executed in other segments of the pipeline. Thus we can
execute multiple instructions simultaneously. The pipeline will be more efficient if
the instruction cycle is divided into segments of equal duration.
c. Define the term collision vector & state diagrams.

Answer – Collision vector - As the pipeline becomes more complicated, we can use a
collision vector to analyze the pipeline and control initiation of execution. The
collision vector is a method of analyzing how often we can initiate a new operation
into the pipeline and maintain synchronous flow without collisions. We construct the
collision vector by overlaying two copies of the reservation table, successively
shifting one clock to the right, and recording whether or not a collision occurs at that
step. If a collision occurs, record a 1 bit, if a collision does not occur, record a 0 bit.
For example, our reservation table would result in the following collision vector:

Collision vector = 011010

Using the collision vector, we construct a reduced state diagram to tell us when we
can initiate new operations.

State Diagram - The Reduced State Diagram The reduced state diagram is a way to
determine when we can initiate a new operation into the pipeline and avoid
collisions when some operations are already in process in the pipeline.

d. Explain Tomasulo’s algorithm for dynamic instruction scheduling.

Answer - Tomasulo’s algorithm is a computer architecture hardware algorithm for dynamic


scheduling of instructions that allows out-of-order execution and enables more
efficient use of multiple execution units. It was developed by Robert
Tomasulo at IBM in 1967 and was first implemented in the IBM System/360 Model
91’s floating point unit.
The major innovations of Tomasulo’s algorithm include register renaming in
hardware, reservation stations for all execution units, and a common data bus (CDB)
on which computed values broadcast to all reservation stations that may need them.
These developments allow for improved parallel execution of instructions that would
otherwise stall under the use of score boarding or other earlier algorithms.
The following are the concepts necessary to the implementation of Tomasulo's
Algorithm:
Common data bus
The Common Data Bus (CDB) connects reservation stations directly to functional
units. According to Tomasulo it "preserves precedence while encouraging
concurrency”. This has two important effects:

1. Functional units can access the result of any operation without involving a
floating-point-register, allowing multiple units waiting on a result to proceed
without waiting to resolve contention for access to register file read ports.
2. Hazard Detection and control execution are distributed. The reservation
stations control when an instruction can execute, rather than a single
dedicated hazard unit.
Instruction order
Instructions are issued sequentially so that the effects of a sequence of instructions,
such as exceptions raised by these instructions, occur in the same order as they
would on an in-order processor, regardless of the fact that they are being executed
out-of-order (i.e. non-sequentially).
Register renaming
Tomasulo's Algorithm uses register renaming to correctly perform out-of-order
execution. All general-purpose and reservation station registers hold either a real
value or a placeholder value. If a real value is unavailable to a destination register
during the issue stage, a placeholder value is initially used. The placeholder value is a
tag indicating which reservation station will produce the real value. When the unit
finishes and broadcasts the result on the CDB, the placeholder will be replaced with
the real value.
Each functional unit has a single reservation station. Reservation stations hold
information needed to execute a single instruction, including the operation and the
operands. The functional unit begins processing when it is free and when all source
operands needed for an instruction are real.
Exceptions
Practically speaking, there may be exceptions for which not enough status
information about an exception is available, in which case the processor may raise a
special exception, called an "imprecise" exception. Imprecise exceptions cannot
occur in in-order implementations, as processor state is changed only in program
order (see RISC Pipeline Exceptions).
Programs that experience "precise" exceptions, where the specific instruction that
took the exception can be determined, can restart or re-execute at the point of the
exception. However, those that experience "imprecise" exceptions generally cannot
restart or re-execute, as the system cannot determine the specific instruction that
took the exception.

UNIT – 4

a. Discuss vector computer?

Answer - Vector- or array-processing computers are essentially designed to maximize the


concurrent activities inside a computer and to match the bandwidth of data flow to
the execution speed of various subsystems within a computer. This chapter reviews
architectural advances in vector-processing computers. It describes the two major
classes of vector machines—namely, the pipeline computers and array processors.
Problems associated with designing pipeline computers are also presented with
examples from the Texas Instruments Advanced Scientific Computer (TI-ASC),
Control Data String Array (STAR-100) and CYBER-205 Computers, Cray Research
CRAY-1, and Floating-Point Systems AP-120B. The chapter describes the
architectures of recently developed SIMD array processors. Further, it examines the
development experiences of the Burroughs Scientific Processor (BSP) and the
Goodyear Aerospace Massively Parallel Processor (MPP). Recent research works on
array and pipeline processors are also summarized. The chapter concludes with the
evaluation of the performance of pipeline and array processors and explores various
optimization techniques for vector operations. Hardware, software, and algorithmic
issues of vector-processing systems and future trends of vector computers are also
discussed.

b. Define hardware multi-threading.

Answer - In computer architecture, multithreading is the ability of a central processing


unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of
execution concurrently, supported by the operating system. This approach differs
from multiprocessing. In a multithreaded application, the threads share the
resources of a single or multiple cores, which include the computing units, the CPU
caches, and the translation look aside buffer (TLB).

Hardware multithreading allows multiple threads to share the functional units of a


single processor in an overlapping fashion to try to utilize the hardware resources
efficiently. To permit this sharing, the processor must duplicate the independent
state of each thread. It Increases the utilization of a processor.

There are two main approaches to hardware multithreading. Fine-grained


multithreading switches between threads on each instruction, resulting in
interleaved execution of multiple threads. This interleaving is oft en done in a round-
robin fashion, skipping any threads that are stalled at that clock cycle. Coarse-
grained multithreading was invented as an alternative to fine-grained
multithreading. Coarse-grained multithreading switches threads only on costly stalls,
such as last-level cache misses

c. Explain multilevel cache coherence.

Answer - Multilevel Cache Organisation

Cache is a random access memory used by the CPU to reduce the average time taken
to access memory.
Multilevel Caches is one of the techniques to improve Cache Performance by
reducing the “MISS PENALTY”. Miss Penalty refers to the extra time required to bring
the data into cache from the Main memory whenever there is a “miss” in cache.
For clear understanding let us consider an example where CPU requires 10 Memory
References for accessing the desired information and consider this scenario in the
following 3 cases of System design:
Case 1 : System Design without Cache Memory

Case 2 : System Design with Cache Memory

Case 3 : System Design with Multilevel Cache Memory

d. Explain the cache inconsistency problem in multi computer and how snoopy protocol can be used

for cache coherence.

Answer - The Cache Coherence Problem

In a multiprocessor system, data inconsistency may occur among adjacent levels or within
the same level of the memory hierarchy. For example, the cache and the main memory may
have inconsistent copies of the same object.
As multiple processors operate in parallel, and independently multiple caches may possess
different copies of the same memory block, this creates cache coherence problem. Cache
coherence schemes help to avoid this problem by maintaining a uniform state for each
cached block of data.
Let X be an element of shared data which has been referenced by two processors, P1 and
P2. In the beginning, three copies of X are consistent. If the processor P1 writes a new data
X1 into the cache, by using write-through policy, the same copy will be written immediately
into the shared memory. In this case, inconsistency occurs between cache memory and the
main memory. When a write-back policy is used, the main memory will be updated when
the modified data in the cache is replaced or invalidated.
In general, there are three sources of inconsistency problem −

 Sharing of writable data


 Process migration
 I/O activity

Snoopy Bus Protocols

Snoopy protocols achieve data consistency between the cache memory and the shared
memory through a bus-based memory system. Write-invalidate and write-update policies
are used for maintaining cache consistency.
In this case, we have three processors P1, P2, and P3 having a consistent copy of data
element ‘X’ in their local cache memory and in the shared memory (Figure-a). Processor P1
writes X1 in its cache memory using write-invalidate protocol. So, all other copies are
invalidated via the bus. It is denoted by ‘I’ (Figure-b). Invalidated blocks are also known
as dirty, i.e. they should not be used. The write-update protocol updates all the cache
copies via the bus. By using write back cache, the memory copy is also updated (Figure-c).
UNIT – 5

a. Define the term critical section.

Answer – Critical Section - A Critical Section is a code segment that accesses shared
variables and has to be executed as an atomic action. It means that in a group of

cooperating processes, at a given point of time, only one process must be executing

its critical section. If any other process also wants to execute its critical section, it

must wait until the first one finishes.

Solution to Critical Section Problem


A solution to the critical section problem must satisfy the following three conditions:
1. Mutual Exclusion
2. Progress
3. Bounded Waiting

b. Define the term multiprogramming.

Answer - Multiprogramming: Multiprogramming is a rudimentary form of parallel


processing in which several programs are run at the same time on a uniprocessor.
Since there is only one processor, there can be no true simultaneous execution of
different programs. Instead, the operating system executes part of one program,
then part of another, and so on. To the user it appears that all programs are
executing at the same time.

If the machine has the capability of causing an interrupt after a specified time
interval, then the operating system will execute each program for a given length of
time, regain control, and then execute another program for a given length of time,
and so on. In the absence of this mechanism, the operating system has no choice but
to begin to execute a program with the expectation, but not the certainty, that the
program will eventually return control to the operating system.

If the machine has the capability of protecting memory, then a bug in one program is
less likely to interfere with the execution of other programs. In a system without
memory protection, one program can change the contents of storage assigned to
other programs or even the storage assigned to the operating system. The resulting
system crashes are not only disruptive, they may be very difficult to debug since it
may not be obvious which of several programs is at fault.

c. Define & discuss synchronous message passing model.

Answer - In synchronous message passing, the components are processes, and processes
communicate in atomic, instantaneous actions called rendezvous. If two processes are to
communicate, and one reaches the point first at which it is ready to communicate, then it
stalls until the other process is ready to communicate. “Atomic” means that the two
processes are simultaneously involved in the exchange, and that the exchange is initiated
and completed in a single uninterruptable step. Examples of rendezvous models include
Hoare’s communicating sequential processes (CSP) [30] and Milner’s calculus of
communicating systems (CCS) [49]. This model of computation has been realized in a
number of concurrent programming languages, including Lotos and Occam. Rendezvous
models are particularly well matched to applications where resource sharing is a key
element, such as client-server database models and multitasking or multiplexing of
hardware resources. A key weakness of rendezvous-based models is that maintaining
determinacy can be difficult. Proponents of the approach, of course, cite the ability to
model no determinacy as a key strength.
Rendezvous models and PN both involve threads that communicate via message passing,
synchronously in the former case and asynchronously in the latter. Neither model
intrinsically includes a notion of time, which can make it difficult to interoperate with
models that do include a notion of time. In fact, message events are partially ordered,
rather than totally ordered as they would be where they placed on a time line.
Both models of computation can be augmented with a notion of time to
promote interoperability and to directly model temporal properties (see, for example, [50]).
In the Pamela system [51], threads assume that time does not advance while they are
active, but can advance when they stall on inputs, outputs, or explicitly indicate that time
can advance. By this vehicle, additional constraints are imposed on the order of events, and
determinate interoperability with timed models of computation becomes possible. This
mechanism has the potential of supporting low-latency feedback and configurable
hardware.
d. Classify language feature for parallelism.

Answer - Languages for parallel programming should meet four goals:

1. Expressiveness
2. Reliability
3. Security
4. Verifiability.

We explore an algebra for networks consisting of a fixed number of reactive


units, communicating synchronously over a fixed linking structure. This
algebra has only two operators: disjoint parallelism, where two networks are
composed in parallel without any interconnections, and linking, whereby an
interconnection is formed between two ports. The intention is that these
operators correspond to the primitive steps when constructing networks. The
algebra is simpler than existing process algebras, and we investigate its
expressive power. The results are:

(1) Expressibility of behaviours: with only three simple processing units, every
finite-state behaviour can be constructed.

(2) Expressibility of operators: we characterise the network operators which


are expressible within the algebra.

Parallel forms reliability (also called equivalent forms reliability) uses one set of
questions divided into two equivalent sets (“forms”), where both sets contain
questions that measure the same construct, knowledge or skill.

This paper presents a set of language features for describing processes and
process interaction, gives examples of their use, and briefly discusses their
relation to the goals. Two constructs, resources and protected variables, are
introduced as the mechanisms for describing interaction. Resources are
extensions of the monitor concept of Hoare; protected variables are global
variables which can only be accessed by one process at a time. Two types of
access control are introduced: restrictions on scope rules for static access,
and capabilities for dynamic access. Examples include the interface to
machine devices, files and virtual devices, device scheduling, device
reservation, and buffer allocation.

You might also like