Internal Assignment: Name Sneha Sankhla Roll Number 2214505216 Program Master of Computer Applications (Mca) Semester 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

INTERNAL ASSIGNMENT

NAME SNEHA SANKHLA

ROLL NUMBER 2214505216


MASTER OF COMPUTER
PROGRAM
APPLICATIONS (MCA)
SEMESTER 1

COURSE CODE & NAME DCA6105 – COMPUTER ARCHITECTURE

SESSION SEPTEMBER 2022


SET 1
ANSWER 1[A]:
Based on the underlying technology computer architecture can be classified into generations which
are as follows:
Zeroth Generation [ 1642-1945]:
• Computers in this generation are based on mechanical parts and gears.
• In 1642, French scientist Blaise Pascal invented a mechanical calculator (Pascal’s
calculator). It used mechanical gears and was powered by hand.
• In 1821, English mathematician Charles Babbage proposed the difference engine, a steam
powered mechanical calculator to find the solutions to polynomial equations.
• In 1833, Babbage designed Analytical Engine, which was supposed to be a general-
purpose, programmable computer that accepted input via punched cards and printed its
output on paper.
First Generation [1945 – 1954]
• Computers in this generation implemented vacuum tubes in architecture.
• In the mid 1940s, electromagnetic relays began to be replaced by vacuum tubes, resulting
in machines with electrons as only moving parts.
• A vacuum tube is a small glass tube with all its gas removed, that allows electrons to move
with minimal interference.
• Vacuum tubes were up to 1000 times faster than electromagnetic relays and could also be
used to construct fast storage devices.
• Computers of this generation include Universal Automatic computer (UNIVAC),
Electronic Discrete Variable Automatic Computer (EDVAC) and Electronic Numerical
Integrator and Calculator (ENIAC)
Second Generation [1954 - 1963]
• Computers with transistors as the main electric part and magnetic core memory.
• Vacuum tubes, in addition to being large, dissipated an enormous amount of heat and
tended to burn out often.
• A transistor is a piece of silicon whose conductivity can be turned on or off using an electric
current.
• Transistors are much smaller, cheaper, more reliable and more energy efficient, which
allowed the machines to be much smaller and faster.
• Machines of this generation include IBM-7070, IBM-1401 and PDP-1
Third Generation [1963 – 1973]
• Computers featuring Integrated Circuits.
• The second generation was still limited to high maintenance issues and was costly.
• Thus, the transistors were replaced by ICs which had hundreds of transistors layered onto
a conductive material connected with conductive layers to form simple circuits.
• These computers needed less maintenance, consumed less energy and were more reliable
with enhanced main memory storage.
• Computers of this generation include IBM 360 Series, UNIVAC 1108/1106, and
Honeywell 6000 Series.
Fourth Generation [1973 – 1985]
• These computers used Large- scale Integrated Circuits (LSI) and Very Large-scale
Integrated Circuits (VLSI).
• The transition from Generation 3 to 4 is based upon scale. VSLI led to integration of
thousands and millions of transistors on IC chips.
• These computers have remarkably high computing power, multi-megabyte RAM and
supply more complex functionality in the same amount of space.
• Examples include IBM 3033, Sharp PC-1211 and so on.
Fifth Generation [1985 -]
• Computers integrated with parallel processing and networking.
• Parallel processing refers to the integration of multiple processors in a single computer.
• By sharing the computational load across multiple processors, a parallel computer can
execute programs in a fraction of time.

ANSWER 1[B]:
Concurrent execution is the execution of multiple tasks in overlapping time frames I.e., to make
progression multiple task the CPU switches between the different tasks during execution.
Parallel execution is the process of executing multiple computational tasks simultaneously by
splitting the task into smaller subtasks which can be processed in parallel (on multiple CPUs).
• Execution: In concurrent computing, execution can be done either on single or multiple
processors. In parallel computing multiple processors are needed for simultaneous
execution.
• Benefits: Concurrent computing supplies the largest amount of work carried out at one
time. Parallel computing improves throughput and increases computational speed.
Flynn’s Taxonomy: Classification based on the concurrency in processing data streams available
in the architecture.
• Single Instruction, Single Data [SISD (Single Instruction, Single Data)]: Architecture in
which a single core processor fetches a single instruction stream (IS) from memory to run
on a single data stream (DS) I.e., one operation at a time. This corresponds to von Neumann
architecture.
• Single Instruction, Multiple Data [SIMD (Single Instruction, Multiple Data)]: Computing
method in which each processing unit executes the exact same instruction at a given
moment, just with different data. The instructions can be executed sequentially using
pipelining, or in parallel using multiple processors.
• Multiple Instructions, Single Data [MISD (Multiple Instructions, Single Data)]: Parallel
computing architecture where multiple processing units perform different instructions on
a single data stream. This architecture is used for fault tolerance where instructions are
executed redundantly on same data to detect and mask errors.
• Multiple Instructions, Multiple Data [MIMD (Multiple Instructions, Multiple Data)]:
Architecture in which multiple processors execute different instructions on different data.
This architecture is made of multiple processors and memory modules connected via some
interconnected network.
--------------------------------------------------------------------------------------------------------------------

ANSWER 2[A]:
CPU organization can be classified into four categories based upon the maximum number of
operands explicitly specified in instruction. The classification is as follows:
Zero-address instructions:
• Stack machines assume the operand to be from the stack and the results to be placed in a
stack, hence there is no address field in this instruction.
• Following program shows how X=(A*B)/C+(D+E) *F will be written for stack-based
computer:
PUSH A TOS ← A
PUSH B TOS ← B
MUL TOS ← (A * B)
PUSH C TOS ← C
DIV TOS ← (A * B)/C
PUSH D TOS ← D
PUSH E TOS ← E
ADD TOS ← (D + E)
PUSH F TOS ← F
MUL TOS ← (D + E) ∗ F
ADD TOS ← (D + E) ∗ F + (A + B)/C
POP X M [X] ← TOS
One-address instruction:
• These operations run on an accumulator- based machine where a special register
‘accumulator’ would be used as an implicit operand for arithmetic instructions.
• Hence, the instructions usually contain one address field and assume accumulator contains
result of all operations.
• Program to evaluate X=(A*B)/C+(D+E) *F
LOAD A AC ← M [A]
MUL B AC ← AC ∗ M [B]
DIV C AC ← AC / M [C]
STORE T M [T] ← AC
LOAD D AC ← M [D]
ADD E AC ← AC + M [E]
MUL F AC ← AC ∗ M [F]
ADD T AC ← M [T]
STORE X M [X] ← AC
Two-address instruction:
• These instructions have two address fields, and each address field can state a processor
register or a memory word.
• Program to evaluate X=(A*B)/C+(D+E) *F is as follows:
MOV R1, A R1 ← M [A]
MUL R1, B R1 ← R1 * M [B]
DIV R1, C R1 ← R1 / M [C]
MOV R2, D R2 ← M [D]
ADD R2, E R2 ← R2 + M [E]
MUL R2, F R2 ← R2 * M [F]
ADD R1, R2 R1 ← R1 + R2
MOV X, R1 M [X] ← R1
Three-address instruction:
• These instructions have three address fields, and each can be used to specify either a
processor register or a memory operand.
• The program in assembly language that evaluates X=(A*B)/C+(D+E) *F is as follows:
MUL R1, A, B R1 ← M [A] * M [B]
DIV R1, R1, C R1 ← R1 / M [C]
ADD R2, D, E R2 ← M [D] + M [E]
MUL R2, R2, F R2 ← R2 * M [F]
ADD X, R1, R2 M[X] ← R1 +R2

ANSWER 2[B]:
Several types of addressing modes are:
• Implied mode: In this addressing mode the instructions do not have a source or destination
address, these addresses are implied by the opcode.
Example: Accumulator and stack-based machines used this mode, as the instructions
implicitly reference that single accumulator or stack (in case of stack machines).
• Immediate Mode: In this mode the instructions have an immediate operand I.e., the
instruction do not have an operand from the memory instead it has a value of operand in
the instruction itself.
Example: MOV #$FEED, D0; it moves immediate value #$FEED to register D0
• Register Direct Mode: In this mode all the operands are in registers and the result is also
placed in registers itself. Hence, no reference to memory is required.
Example: AC = [R1] + [R2]; it moves sum of two registers to accumulator.
• Register Indirect: In this mode, the address of the operand is stored in a C a CPU register
and the address of that register is the effective address of that instruction.
Example: AC ← AC + [[R]]
• Register Autoincrement and Autodecrement: It is like register indirect mode, but after
accessing the effective operand, the value of base register is incremented or decremented
by the size of data item to be accessed. This approach can be used to traverse through
elements of array or vector.
• Direct Addressing: In this, the instruction contains the absolute address of the operand,
without any modifications or references.
Example: AC ← AC + [X]; adds value at X to the value of accumulator.
• Indirect Addressing: In this, the instruction’s address field contains an address that points
to the effective address of the operand.
Example: AC ← AC + [[X]]
• Relative Addressing: In this, the address field is a signed number which is added to the
next instruction address to obtain the instruction’s effective address.
• Indexed Addressing: In this, the effective address of instruction is obtained by adding the
contents of index register and the address field of the instruction. The address field may
define the beginning address of an array.
--------------------------------------------------------------------------------------------------------------------

ANSWER 3[A]:
Pipeline hazards are the problems in instruction pipeline that occur when the next instruction
cannot execute in the current clock cycle, causing computational inaccuracy or issues.
Pipeline hazards are classified into: Structural Hazards, Data Hazards, and Control Hazards.
Data Hazards:
• Data hazards occur due to instructions exhibiting data dependency. This instruction
depends on the result of the prior instruction and changes the data in distinct stages of
pipeline.
• Situations in which data hazard can occur are:
▪ Read after Write (RAW): Occurs when an instruction tries to read the output of
prior instruction before the prior instruction could process it.
▪ Write after Read (WAR): occurs when instruction 2 tries to write to a destination
before instruction 1 reads it.
▪ Write after Write (WAW): occurs when instruction 2 tries to write an operand
before instruction 2 writes to it.
Structural Hazards:
• Structural hazards occur when the hardware cannot support a group of instructions I.e.,
when two or more instructions in a pipeline tries to access same resource.
Example: When multiple instructions are ready to enter execution phase and there is only
one ALU; Multiple instructions tires to access memory.
Control Hazards:
• This hazard occurs when branches break the pipeline due to delay in fetching instructions
and decision making.
• The pipeline makes incorrect branch prediction and adds instructions to pipeline. These
instructions have a chance of being discarded later in the pipeline.
• Instructions that can introduce control hazards are: Conditional and Unconditional
branches, Indirect branches, Procedure calls, and Procedure returns.
Techniques to handle pipeline hazards are:
• Pipeline Bubbling: Used to eliminate all three types of hazards. If the control logic detects
future hazards, it inserts no operation (NOs) into the pipeline. This gives sufficient time to
the current instruction to complete the execution and delays fetching of new instruction
(that could cause a hazard).
• Operand Forwarding: Used to eliminate data hazard. Interface registers are used to store
the intermediate output between stages. The dependent instructions can access the value
from the registers directly.
• Out-of-order execution: this can avoid use of pipeline bubbles. This is an algorithm that
helps CPU to execute instruction based on availability of input data and execution unit,
instead of their original sequence.
• Pipeline stall cycles: This approach is used to eliminate control hazards. The pipeline is
frozen until the destination of branch is determined, then resume the fetch. It can increase
the number of stall cycles if the instructions mix contains many branch instructions or if
the pipeline is deep.
• Resource modifications: In case of a structural hazard, adding more hardware resource to
the pipeline or replicating resource can reduce the probability of it occurring.

ANSWER 3[B]:
Dynamic Scheduling: Technique in which the hardware determines which instruction to execute
to reduce the stalls, while maintaining data flow and exception behavior. If an instruction is stalled,
then other instructions can be issued and executed if they do not depend on any active or stalled
instructions. It implies out-of-order execution and out-of-order completion.
The Tomasulo approach:
• It is a hardware algorithm for dynamic scheduling.
• Key concepts include register-renaming, reservation status for all execution units and a
common data bus.
• Instruction lifecycle: Three stages through which each instruction passes.
▪ Issue: Instructions are issued if all the operands and reservation station are free or
else, they are stalled. Registers are renamed to avoid WAR and WAW hazards
▪ Execute: Instructions are executed when all the operands become available. If any
operand is unavailable, then the instruction is delayed till the operand becomes
available at CBD. If the instruction is a load, then execute as soon as memory unit
is available. If the function is a store, then wait for the value to be stored and sent
to memory unit. Else if the instruction is an ALU operation then execute the
instruction at corresponding FU.
▪ Write Result: If instruction is ALU operation then the result is written on CDB and
from there into registers. If the instruction is ‘store,’ then the data is written to the
memory.
--------------------------------------------------------------------------------------------------------------------

SET 2
ANSWER 4[A]:
Cache mapping is a policy that determines where a particular memory block is placed when it goes
into the cache.
Cache mapping is also known as Cache placement policy.
Three different policies available for cache placement are: Direct mapping, fully associative
mapping and set associative mapping.
Direct mapping:
• The cache is divided into sets, with a single cache line in each set (can be visualized as n x
1 column matrix). A single memory block is assigned to a cache line based on its address.
It forces a memory block to be stored in a particular cache block
• The memory address of incoming block is divided into tag, index, and offset (MSB to LSB)
entries of cache.
• Tag is used to distinguish between different memory blocks that correspond to the same
cache set.
• Index determines in which cache set the memory block will go in.
• Offset determines the desired data stored in the cache line.
• Placing a block in cache:
▪ To determine which memory block should go to which cache set, index bits are
used. Index bits are derived from the memory block address.
▪ The memory block is stored in the corresponding cache set and memory address’s
MSB are stored in the tag part of the cache entry.
▪ If the cache is already occupied, then then new entry replaces it.
• Searching a cache entry:
▪ The set is identified using the index bits of memory address.
▪ Tag bits of memory address and tag bit of set are matched. If they are similar, then
it is a cache hit and the cache block is sent to processor. If it is a cache miss, then
the memory block is moved into cache.
Associative mapping:
• In this technique, the memory block can be stored in any unused cache block.
• The cache is organized in a single cache set which consists of multiple cache lines. The
cache can be visualized as a 1 x m row matrix.
• The incoming memory address is divided into offset and tag.
• Offset is used to determine which byte to access from the cache line
• Tag is used to map memory blocks to cache blocks.
• Placing a block in cache:
▪ A cache line is selected based on the valid bit. If the valid bit is 0 then the memory
bit is placed into it, else another line with 0 valid bit is selected.
▪ If the cache is full, then an existing block is removed to make room for the new
block.
• Searching cache entry:
▪ The Tag field is used to search for the required memory block.
▪ If it is a hit then the content is sent to processor according to the offset field, else
content is fetched from the memory.

ANSWER 4[B]:
Vector Processor: It is a CPU whose instructions are designed to operate on a large one-
dimensional array of data called vectors. The execution is done automatically in parallel.
Two types of vector processing are:
• Vector Memory-Memory Architecture: In these architecture instructions all operands are
fetched and stored to the main memory. This approach takes longer time due to main
memory access. Examples of memory-memory machines are CDC Star-100(‘73) and TI
ASC (‘71)
• Vector-Register Architecture: In this architecture, the vector operations (except load and
store) took place between the vector registers. The intermediate results of operation are
stored in registers rather than the memory. This approach is faster than the previous one as
registers access is faster than main memory. Example of vector-register machine is CRAY-
1(‘76).
--------------------------------------------------------------------------------------------------------------------
ANSWER 5[A]:
Granularity is the measure of the amount of computation done by a task.
Fine-grained:
• In fine grained SIMD architecture, a program is broken into substantial number of small
tasks and each task is assigned to one of multiple processors.
• Since a program is divided into an enormous number of parts, the amount of work related
to each task is low.
• The less processing of each task requires a high number of processors to complete the
processing. This in turn increases the communication and synchronization overhead
between processors.
• Since the work is evenly distributed across the processors, it facilitates load balancing.
• Workload scheduling is important since many tasks are computed in parallel.
• A shared memory architecture is suitable for fine-grained processing as they facilitate low
communication overhead.
• Fine-grained parallelism can be achieved at loop-level with grain size of 500 instructions
and instruction-level parallelism with grain size of 20 instructions.
Coarse-grained:
• In coarse grained SIMD, a program is broken into large chunks of tasks and is divided
among processors.
• Due to the generous size of sections, the amount of computation a processor does is high.
• The advantage is low communication and synchronization overhead, since the number of
parallel processors is less.
• This parallelism can lead to load imbalance, since tasks can be unevenly distributed some
tasks can process bulk data while others can be idle.
• It fails to properly implement parallelism since the size of tasks are large, most of the
computation are performed sequentially on processors
• This parallelism is suitable for Message-passing architectures as those systems take a long
time to communicate between processes.
• Coarse-grained parallelism is used at program-level with grain sizes ranging from tens of
thousands of instructions.

ANSWER 5[B]:
Shared Memory is a memory that allows multiple programs to access it simultaneously to enable
communication amongst them and avoid redundant copies. Shared memory may use Uniform
memory access (UMA), non-uniform memory access (NUMA), and cache-only memory
architecture (COMA).
Uniform Memory Access (UMA):
• It is a shared memory architecture in which each processor is granted equal access to the
memory in the same way a single memory accesses its memory.
• All processors access the memory through an interconnection network which can be a
single bus, multiple bus, or a crossbar switch.
• Interconnection to memory is independent of which processor accesses it, for how long or
which memory location it is accessing. This means less programming is needed as
processors are not differentiated.
• Along with common memory access each processor is granted a personal cache. This cache
will serve individual users efficiently, rather than serving the entire network poorly.
• This is the most popular organization in shared memory systems. Examples of these
systems are Sun Starfire servers, HP V series, Compaq Alpha Server GS, Silicon Graphics
Inc. multiprocessor servers.
Non-uniform Memory Access:
• In this memory architecture, a processor has its own local memory, and it can access
memories of other processors.
• NUMA allows all the processors to access each other’s memory directly, as the CPU sees
it as a single, linear address space.
• For a processor, access to local memory is faster than non-local memory. Non-local
memory can be another processor’s memory or memory shared between processors.
• The access time to memory modules depends on its distance from the processor, which
results in non-uniform memory access time.
• The limitation of UMA is that when a processor is added, the shared bus becomes
overloaded which results in performance bottleneck. To prevent this NUMA, add some
intermediate memory between processors, so all data access does not have to be performed
on shared bus.
• A NUMA architecture node can be connected in many forms, among which are tree and
hierarchical structure.
• One of the main benefits of NUMA is its support for scaling, which makes it suitable for
server-side applications and decision support systems.
Cache-only Memory Architecture:
• This architecture is like NUMA, where each processor has a part of shared memory,
however the shared memory of each processor is used as a cache.
• The COMA hardware replicates the data and migrates it to the cache of the processor that
is accessing it.
• The address space consists of all the cache and there is a cache directory which helps in
remote cache access.
--------------------------------------------------------------------------------------------------------------------

ANSWER 6[A]:
RAID stands for Redundant array of independent disks. It is a group of physical disk drives that
function as one to increase storage and improve performance. The data is copied to multiple drives
for faster throughput, data redundancy, fault tolerance, error correction and improved mean time
between failures. RAID is classified into levels based on how data is stored in them. Various levels
of RAID are:
• RAID 0: Data is stripped across two or more disks, without parity information, redundancy,
or fault tolerance. Since it does not provide any data redundancy or fault tolerance, if one
drive fails entire data is lost. RAID 0 is typically used to implement high speed and
increased performance.
• RAID 1: It provides fault tolerance using data mirroring. An exact copy of data is made
onto two or more disks. It offers no parity or spanning of disk space. Data access can be
slow since two or more copies of data need to be created.
• RAID 2: It uses bit-level stripping I.e., it separates data on bit-level and then save these
bits over different data disks and redundancy disks.
• RAID 3: It consists of byte-level stripping, with a dedicated parity disk for storing parity
bits. The disk must spin coordinated, so that data is spread across disks at the same physical
locations. Adequate for application with high transfer rate on long sequential reads and
writes.
• RAID 4: It consists of block-level stripping, with one block one block written on drive 1,
second block written on block 2 and so on. It consists of a dedicated parity bit disk. It
provides better ‘read’ performance, but writing can be slow due to writing parity data to a
single disk. It does not require synchronized spinning.
• RAID 5: It consists of block-level stripping with distributed parity. Since parity is
distributed among drives, no single drive can cause bottleneck. It is most popular as it
provides a mix of good speed, storage, and data recoverability.
• RAID 6: It is an extension of RAID 5 with two parity blocks on each disk. It enables the
system to continue operations even if it faces two concurrent disk failures. It has a
performance penalty on write operation for parity calculations.

ANSWER 6[B]:
Multithreading is the ability of a CPU to execute multiple threads concurrently which helps
maximize CPU utilization and reduces idle time. Programmers can divide long programs into
threads and execute them in parallel, which increases the speed of the execution and can-do
multiple tasks simultaneously.
Differences between process and thread are:
• Process is an instance of computer program that is in execution, whereas thread is the
smallest sequence of instruction that is executed on a processor.
• Process is independent with their own code, data, and files, whereas thread is a subset of
process that shares data and files of parent process with other sibling threads.
• Each process has their separate address space isolated from other processes. Threads
belonging to the same parent process share a single address space.
• Processes can only communicate through inter-process communication which is quite
heavy due to independent memory. Inter-thread communication is fast because of the
shared memory.
• Process management is an expensive task as it includes allocation and deallocation of
resources, entry of PCB and memory. Thread handling is much easier as threads do not
have resources except for a stack and copy of registers.
• Context switch in process is heavy as it requires moving from one PCB to another. Thread
context switch is cheap as it does not change virtual memory.

You might also like