Professional Documents
Culture Documents
Computer Organization and Architecture
Computer Organization and Architecture
AND ARCHITECTURE
For
COMPUTER SCIENCE
COMPUTER ORGANIZATION
.
AND ARCHITECTURE
SYLLABUS
Machine instructions and addressing modes, ALU and data-path, CPU control design,
Memory interface, I/O interface (Interrupt and DMA mode), Instruction pipelining,
Cache and main memory, Secondary storage.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
CONTENTS
Topics Page No
1. OVERVIEW OF COMPUTER SYSTEM
1.1 Introduction 01
1.2 Functional Units 01
1.3 Numbers and Arithmetic Operations 02
1.4 Decimal Fixed-Point Representation 04
1.5 Floating Point Representation 04
1.6 Signed-Operand Multiplication 05
1.7 Booth’s Algorithm 05
1.8 Integer Division 06
1.9 Non-Restoring-division Algorithm 07
1.10 Flouting-Point Numbers and Operations 07
2. INTRODUCTIONS
3. MEMORY ORGANIZATION
3.1 Introduction 22
3.2 Memory Hierarchy 22
3.3 Memory Characteristics 25
3.4 Semiconductor Ram Memories 26
3.5 Virtual Memory Technology 35
3.6 Advantages of using Virtual Memory 36
3.7 Paging, Segmentation and Paged Segments 37
3.8 Secondary Memory Technology 40
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4.7 Responsibilities of I/O Interface 52
4.8 IBM 370 I/O Channel 53
4.9 Polling 55
4.10 Independent Requesting 55
4.11 Local Communication 56
6. GATE QUESTIONS 66
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1 OVERVIEW OF COMPUTER SYSTEM
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Any other arithmetic or logic operation,
like multiplication, division is initiated 1.3.1 Number Representation
by bringing the required operands into
the processor, where the operation is Consider an n-bit vector
performed by the ALU. C = Cn-1…………..C1C0
The control and the arithmetic and logic Where Ci = 0 or 1 for 0 i n-1. This
units are many times faster than the vector can represent unsigned integer
other devices connected to a computer values V in the range 0 to 2n-1, where
system. This enables a single processor V(C) = Cn-1 2n-1+…… + c1 21 + c 0 20
to control a number of external devices Three systems are used for representing
such as keyboards, displays, magnetic the positive and negative numbers:
and optical disks.
1. Sign and Magnitude
1.2.4 Output Unit The leftmost bit is 0 for positive
The output unit is the counterpart of the numbers and 1 for negative numbers.
input unit. Its function is to send processed In this, negative values are represented
results to the outside world. For example: by changing the most significant bit
printers. from 0 to 1 in the vector C of the
corresponding positive value.
1.2.5 Control Unit For example:
+ 5 0101
The memory, arithmetic and logic, and - 5 1101
input and output units store and
process information and perform input 2. 1’s Complement
and output operations. The control unit The leftmost bit is 0 for positive
is effectively the centre that sends numbers and 1 for negative numbers
control signals to other units and senses Negative values are obtained by
their states. complementing each bit of the
corresponding positive number.
The operation of a computer:
For example:
The computer accepts information in For -3 we can find by complementing each
the form of programs and data through bit in the vector 0011 to yield 1100.
an input unit and stores it in the Same operation is used for converting a
memory. negative number to the corresponding
Information stored in the memory is positive value. The operation of forming
fetched, under program control, into an the 1’s complement of a given number
arithmetic and logic unit, where it is is equivalent to subtracting that
processed. number from 2n-1.
Processed information is output
through a output unit and all activities 3. 2’s Complement
inside the machine is directed by the The leftmost bit is 0 for positive
control unit. numbers and 1 for negative numbers.
In this, forming the 2’s complement of a
1.3 NUMBERS & ARITHMETIC OPERATIONS number is done by subtracting that
Computers are built using logic circuits that number from 2n.
operate on information represented by two Hence, the 2’s complement of a number
values as 0 and 1 and we define the amount is obtained by adding 1 to the 1’s
of information as a bit information. complement of that number.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1.3.2 Arithmetic Addition A carryout of the sign bit position is
discarded.
1. In Signed-Magnitude form Changing a positive number to a
Follows the rules of ordinary arithmetic negative number is easily done by
If the signs are same add two taking its 2’s complement and vice-
magnitudes and give the sum common versa is also true.
sign. For example : (-6) – (-13) = +7
If the signs are different subtract In binary format, it is written as
smaller magnitude from the larger and 11111010 – 11110011
give the result, the sign of the larger The subtraction is changed to addition by
magnitude. taking the 2’s complement of the
For Example: subtrahend (-13) to give (+13).
(+35) + (-37) = -(37-25) = -2 In binary format this is
11111010 + 00001101 = 100000111
2. In 2’s complement form and removing the end carry, we obtain the
The system does not require a answer as 00000111 (+7).
comparison or subtraction only
addition and complementation is 1.3.4 Overflow in Integer Arithmetic
necessary.
The procedure is as follows: Add the In the 2’s complement number
two numbers including their sign bits representation system, n-bits can
and discard any carry out of the sign represent values in the range -2n-1 to +2n-
(left most) bit position. 1-1.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1.4 DECIMALFIXED-POINT REPRESENT-
ATION
The representation of decimal numbers
in registers is a function of the binary
code used to represent a decimal digit.
A 4-bit decimal code requires four flip
flops for each decimal digit.
Disadvantages Example :
By representing numbers in decimal we + 6132.789
are wasting amount of storage space Fraction: +0.6132789
since the number of bits needed to store Exponents: +0.4
a decimal number in a binary code is Floating point is always interpreted to
greater than the number of bits needed represent a number in the following
of its equivalent binary representation. form m re m and e are physically
The circuits required to perform represented in the register (including
decimal arithmetic are more complex. the signs). The radix r and the radix-
point position of the mantissa are
Advantage always assumed.
In applications like business data A floating point binary number is
processing we require small amounts of represented in a similar manner except
arithmetic computations (in decimal that it uses base-2 for exponent.
format).
For example: The binary number +
The representation of signed numbers
1001.11 is represented with 8 bit fraction
in binary is similar to the
and 6 bit exponent as follows.
representation of signed decimal
Fraction Exponent
numbers in BCD. The sign of a decimal
01001110 000100
number is usually represented with
four bits to confirm with the 4-bit code A floating point number is said to be
of the decimal digits. normalized if the most significant digit
The signed-magnitude system is of the mantissa is nonzero.
difficult to use with computers. The For example:
signed complement system can be The decimal number 250 is normalized
either the 9’s or the 10’s complement is but 00035 is not.
the one most commonly used. To obtain Regardless of where the position of the
the 10’s complement of a BCD number, radix point is assumed to be in the
we first take the 9’s complement and mantissa, the number is normalized
then add one to the least significant only if its leftmost digit is nonzero.
digit. The 9’s complement is calculated The number can be normalized by
from the subtraction of each digit from 9. shifting three positions to the left and
The subtraction of decimal numbers is discarding the leading 0’s to obtain
either unsigned or in the signed-10’s 11010000. Normalized numbers
complement system. Take the 10’s provide the maximum possible
complement of the subtrahend and add precision for the floating point number.
it to the minuend. A zero cannot be normalized in floating
point by all 0’s in the mantissa and
1.5 FLOATING POINT REPRESENTATION
exponent.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1.6 SIGNED-OPERAND MULTIPLICATION multiplicand, as in the standard procedure.
However, we can reduce the number of
The multiplication of signed operands required operations by regarding this
generates a double-length product in the multiplier as the difference between the
2’s complement number system. In general, two numbers:
accumulate partial products by adding 0100000 (32)
versions of the multiplicand as selected by - 0000010 (2)
the multiplier bits. _________________
Case (i): 0011110 (30)
Positive multiplier and negative This suggests that the product can be
multiplicand. generated by adding 25 times the
When we add a negative multiplicand to a multiplicand to the 2’s complement of 21
partial product, we must extend the sign- times the multiplicand. For convenience,
bit value of the multiplicand to the left as we can describe the sequence or required
far as the product will extend. operations by recoding the preceding
For example: multiplier as 0+1000-10.
The 5 bit signed operand, -13 is the
multiplicand and it is multiplied by +11, to
get the product as -143.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
multiplier is scanned from right to left. The top number is the 2’s complement
Figure 9 illustrates the normal and the representation of -2k+1. The recoded
Booth’s algorithms for the example just multiplier now consists of the part
discussed. The Booth’s algorithm corresponding to the second number,
clearly extends to any number of blocks with -1 added in position k+1. For
of 1s in a multiplier, including the example, the multiplier 110110
situation in which a single 1 is consider becomes 0-1 +10-10.
a block see figure 10 for another The Booth’s technique for recoding
example of recoding the multiplier. In multipliers is summarized in above
this example, the least significant bit is table. The transformation 011…110
1. This situation is uniformly handled +100…..0 -10 is called skipping over 1s.
by assuming that an implied 0 lies to its This term is derived from the case in
right. which the multiplier has its 1s grouped
The Booth’s algorithm can also be used into a few contiguous block; only a few
for negative multiplier, as figure shows. versions of the multiplicand, that is, the
To see the correctness of this technique summands, must be added to generate
in general, we use a property of the product, thus speeding up the
negative number representations in the multiplication operation. However, in
2’s complement system. Let the leftmost the worst case that of alternating 1s and
zero of a negative number, X, be at a bit 0s in the multiplier-each bit of the
position k, that is multiplier selects a summand. In fact,
X= 11…..10xk-1….x0 this results in more summands than if
The value of X is given by the Booth algorithm were not used. A
V(X) = -2k+1 + xk-1*2k-1 +x0 *20 16-bit, worst-case multiplier, an
This is supported by observing that ordinary multiplier, and a good
11…..100 ….0 multiplier are shown in figure 12.
+ 00……00xk-1 ….x0 The Booth’s algorithm has three
________________________ attractive features
X =11…..10xk-1 …..x0 1. It handles both positive and
negative multipliers uniformly.
Table: Booth multiplier recording table 2 Second, it achieves some efficiency
in the number of additions required
Multiplier Version of when the multiplier has a few large
Bit Bit -1 multiplicand blocks of 1s.
I by bit I 3. The speed gained by skipping over
0 0 0×M 1s depends on the data. On average,
0 1 +1 × M the speed of doing multiplication
1 0 -1 × M with the Booth’s algorithm is the
1 1 0×M same as with the normal algorithm.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
If the remainder is zero or positive, a The q0 bit is appropriately set to 0 or 1
quotient bit of 1 is determined, the after the correct operation has been
remainder is extended by another bit of performed.
the dividend, the divisor is
repositioned, and another subtraction is 1.9 NON-RESTORING-DIVISION ALGORITHM
performed.
If the remainder is negative, a quotient Step1 :
bit of 0 is determined, the dividend is Do the following n times.
restored by adding back the divisor, is If the sign of A is 0, shift A and Q left one
repositioned for another subtraction. bit position and subtract M from A.
otherwise, shift A and Q left and add M to
A
If the sign of A is 0, set q0 to 1
otherwise, set q0 to 0.
Step 2 :
If the sign of A is 1,add M to A, step 2 is
needed to leave the proper positive
remainder is A at the end of n cycles.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
the 2’s complement system, the signed (ii) The exponent range ( 99) are
value F, represented by the n-bit binary sufficient for a wide range of
fraction. calculations. It is possible to
C = C0C-1C-2……..b-(n-1) is given by approximate this mantissa precision
F(c) = -C0 20 + C-1 2-1+C-2 2-2+…..+C- and scale factor range in a binary
(n-1) 2-(n-1) representation that occupies 32 bits.
Where the range of F is, A 24-bit mantissa can approximately
-1 F 1-2-(n-1) represent a 7-digit decimal number,
Consider the range of values represent and an 8-bit exponent to an implied
able in a 32-bit, signed, fixed-point base of 2 provides a scale factor
format. Interpreted as integers, the with a reasonable range. One bit is
value range is approximately 0 to 2.15 needed for the sign of the number.
10-9. If we consider them to be Because the leading nonzero bit of a
fraction, the range is approximately normalized binary mantissa must be
4.55 10-10to 1. a 1, it does not have to be included
Hence, we need to accommodate both explicitly in the representation.
very large integers and very small Thus, total of 32-bits is needed.
fractions. To do this, a computer must The standard explained above for
be able to represent numbers and representing floating-point numbers
operate on then in such a way that the in 32-bits has been developed and
position of the binary point is variable specified in detail by the Institute of
and is automatically adjusted as Electrical and Electronics Engineers
computation proceeds. (IEEE). This standard describes both
Such a representation is called as the representation and the way in
floating point representation. which the four basic arithmetic
Due to the position of binary or floating operations are to be preformed.
point in a number is variable and it The 32-bit representation is given in
must be given strictly in the floating figure below
point representation.
By convention, when the decimal point
is placed to the right of the first
(nonzero) significant digit, the number
is said to be normalized.
Thus, floating point number
representation is number
representation in which a number is
represented by its sign, a string of The sign of the number is given in the
significant digits, known as mantissa first bit
and an exponent to an implied base for Followed by a representation for the
the scale factor. exponent (to the base 2) of the scale
factor.
1.10.1 IEEE STANDARD FOR FLOATING-
Instead of the signed exponent, E, the
POINT NUMBERS
value actually stored in the exponent
field is an unsigned integer E’ = E +
A general form is 127. This is called the excess -127
X1. X2 X3 X4 X5 X6 X7 10 Y 1Y 2 format. Thus E’ is in the range 0 E’
Where Xi and Yi are decimal digits. 225. The end values of the range, 0 and
255 are used to represent special
(i) The number of significant digits (7) values. Therefore, the range of E’ for
8
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
normal values is 1 E’ 254, that to 21023. The 53-bit mantissa
1022
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1.10.3 Arithmetic operations on floating-
point numbers: 1.10.6 Precision Consideration
The rules for addition and subtraction can Prior to a floating point operation, the
be stated as follows: exponent and significant of each operand
Add/Subtract Rule are loaded into the ALU registers.
1. Choose the number with the smaller In case of significant the length of the
exponent and shift its mantissa right a register is almost always greater than the
number of steps equal to the difference length of the significant plus and implied
in exponents. bit. The register contains additional bits,
2. Set the exponent of the result equal to called guard bits, which are used to pad out
the larger exponent. the right end of the significant with 0’s
3. Perform addition/subtraction on the
mantissas and determine the sign of the Example
result.
4. Normalize the resulting value, if
necessary. Multiplication and division
are somewhat easier than addition and
subtraction, in that no alignment of
mantissas is needed. Multiply Rule
1. Add the exponents and subtract 127. 1.10.7 Rounding
2. Multiply the mantissas and
determine the sign of the result.
A number of techniques have been
3. Normalize the resulting value, if
explored for performing rounding.
necessary. 1 Round to nearest The result is rounded to
the nearest represent
able number
2 Round toward + The result is rounded up
toward plus infinity
3 Round toward - The result is rounded up
down toward negative
infinity
4 Round toward 0 The result is rounded
toward zero
10
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
(+ )-(- )= +
Quiet and Signaling NaNs Renormalized numbers are useful for
A Nan is a symbolic entity encoded in exponent under flow, therefore they are
floating-point format, of which there included in IEEE 754.
are two types: When the exponent of the result
(i) Signaling becomes too small (a negative exponent
(ii)Quiet with too large a magnitude), the result
is demoralized by right shifting the
(i) Signaling fraction and incrementing the exponent
A signaling Nan signals an invalid for each shift, until the exponent is
operation exception whenever it within a represent able range.
appears as an operand. The above figure explains the effect of
Signaling Nan’s affords values for the addition of renormalized numbers.
uninitialized variables and arithmetic The represent able numbers can be
like enhancement that are not the grouped into intervals of the form [2n,
subject of the standard. 2n+1]
(ii) Quiet Within each such interval, the exponent
A quiet Nan’s propagates through portion of the number remains constant
almost every arithmetic operation while the fraction varies, producing a
without signaling an exception uniform spacing of represent able
numbers within the interval.
Note : Both types of Nan’s have the As approaches towards zero, each
same general format: an exponent of all successive interval is half number of
ones and nonzero fraction. The actual representable numbers. Hence, the
bit pattern of the nonzero fraction is density of representable numbers
implementations dependent; the increases as we approach zero.
fraction values can be used distinguish If only normalized numbers are used,
quiet Nan’s from signaling Nan’s and to there is a gap between the smallest
specify particular exception conditions. normalized number and 0. In case of
32-bit IEEE 754 format, there are 223
1.10.9 Table: Operations that Produce a represent able numbers in each
Quiet NaN interval, and the smallest represent able
positive number is 2-126. With the
Operation Quiet NaN Produced by addition of demoralized numbers, an
Any Any operation on a signaling additional 223 number are uniformly
NaN added between o and 2-126
Add or subtract Magnitude subtraction of Without demoralized numbers, the gap
infinities:
between the smallest representable
(- ) +(- )
(- ) +(+ ) nonzero number and zero is much
(+ ) –(+ ) wider than the gap between the
(- ) –(- ) smallest representable nonzero number
Multiply 0 and the next larger number.
Division 0 In case of demoralized numbers is
or referred to as gradual underflow.
0
Gradual underflow fills in the gap and
Remainder x REM 0 or REM y reduces the impact of exponent
Square root x where x < 0 underflow to a level comparable with
round off among the normalized
1.10.10 Demoralized Numbers numbers.
11
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
12
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2 INTRODUCTIONS
13
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
memory of I/O. And on second line five 2.3 INSTRUCTION FORMATS
circles (operations) represent the internal
CPU operations. The states can be A program consists of a sequence of
described as: instruction, each one specifying some
Instruction Address Calculation particular action.
(iod): Determine the address of the
next instruction to be executed. • Typical Instruction Formats
Instruction Fetch (if): Read
instruction: from its memory location Addressing Modes
into the CPU.
Instruction Operation Decoding (iod) : 1. Immediate Addressing
Analyze instruction to determine type
of operation to be performed and
operand (s) to be used.
Operand Address Calculation (oac):
If the operation involves reference to an Description :
operand in memory or variable via I/O
The operand is directly specified in the
then determine the address of the
operand field.
operand.
The instruction is a multiword
Operand Fetch (of) :
instruction, where the operands
Fetch the operand from memory or
immediately follow the op code.
read it in from I/O.
Both the op code and the operand are
Data Operation (do):
fetched from memory using program
Perform the operation indicated in the
counter.
instruction.
Use of the immediate addressing modes
Operand Store (os): → Loading internal registers with initial
Write the result into memory or out to value.
I/O. → Perform arithmetic or logical
operation on immediate data
2. Direct Addressing
14
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
fetched from memory by using program 5. Indirect Addressing
counter
The direct address available is then
used to access the operand.
3. Extended Addressing
Description :
In this indirect addressing mode, the
instruction contains an address that
points to the memory location where
Description : the effective direct address to be used
The effective memory address is for operand is stored.
directly specified with the instruction. It 6. Register Indirect format
uses 16 bits address.
This addressing is slow way of
accessing memory because the
instruction is 3 bytes long and requires
3 memory accesses using the PC to
acquire the instruction.
4. Register Addressing
Description :
In this, the instruction opcode specifies
an internal registers or register pair
which contains the effective address to
be used for accessing operand in
memory.
This mode is used to save program
space and improve speed of program
Description : execution in situations where data
In this, the instruction op code specifies elements are to be accessed from
the CPU registers where the operand is memory.
stored.
Two way of implementation 7. Base Addressing
→ When two registers are specified one
will
be used as source while the other will
be
used as destination.
→ Using internal registers instead of
memory for operand makes this mode
instructions execute faster than other
mode instructions.
15
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
In this, the opcode specifies a register 10. Relative Addressing
that contains an address. The
instruction also contains an offset field
that contains a displacement
The effective address is formed by
addition of the base address and the
displacement value.
8. Indexed Addressing
Description :
In this, the operand comes from a
location relative to the executed
instruction position.
The operand effective address = contents
of the program counter + the signal value
specified by the instruction in its address
field.
Example:
ADD R1, A, B
Description :
Processor with three address instruction
Description : format can use each address field to specify
This is the combination of two modes i.e., either a processor register of a memory
Base addressing and index addressing. operand
Also in this, the instruction op code Advantage
specifies two register that contains base Three-address format is that it result in
address and an index register that short programs when evaluating arithmetic
contain an index value. operations.
This mode instruction can used 8-bit or Disadvantage
16-bit displacement as option. If Binary coded instructions require too many
displacement is used, it is also added to bits to specify the operands.
get effective address.
16
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2. Two-address instructions A (k+1) bit op code and an (n-1) bit
Format: address gives more operations, but the
OPCODE ADDRESS1 ADDRESS2 price is either a smaller number of cells
addressable or poorer solutions and the
Example: same amount of memory addressable.
MOVR1, A
Description : 2.4 INSTRUCION INTERPRETATION
In this, each address field can specify
either a processor register or a memory Instruction interpretation is use for
operand. activating the control signals that
The first symbol listed in an instruction cause the data processing unit to
is assumed to be both a source and execute the instruction. The control
destination where the result of signals are transmitted from the
operation is transferred. control unit to the outside through
control lines.
3. One-address instructions
Format:
OPCODE ADDRESS 1
Example:
LOAD A
Description :
One address instructions use an implied
accumulator register for all data Control Specification
manipulation.
All operations are done between the The four groups of control signals
accumulator and a memory operand. have following functions.
No. Signal Description
4. Zero-address instructions 1. C’out These signals directly control the
operation of the data processing
Format:
unit. The main function of the
OPCODE control unit is to generate C’out
Example: 2. C’in These signals enable the data
ADD being processed to influence the
MUL control unit, allowing data
dependent decision to be made. A
Description : important function of C’in is to be
A stack organized computer does not use indicate the occurrence of
unusual conditions such as errors
an address field for the instruction in the data processing unit.
3. C”out These signals are transmitted to
Expanding Op codes other control units and may
indicate status conditions such as
Consider an (n+k) bit instruction with a “busy” or operation completed.
k-bit opcode and a single n-bit address. 4. C”in These signals are received from
This instruction allows 2k different other control units. They typically
include start and stop signals and
operations and 2n addressable memory
timing information.
cell or the same n +k bits would be C”in and C”out are primarily used to
broken up into a (k-1) bit code and an synchronize the control unit with
(n+1) bit address. the operation of other control
unit.
17
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2.4.1 IMPLEMENTATION METHODS (I) The number of state and input
combinations may be so large that the
Hardwired Control state-table size and the amount of
In this approach, we design the control computations needed become
units that use fixed logic circuits to excessive.
interpret instructions and generate control (II)State table tend to conceal useful
signals from them. information about a circuit’s behavior.
Design Methods For example, the existence of repeated
The design of Hardwired control unit patterns or loops.
involves various complex tradeoffs (III)Control circuits designed from state
between the amount of hardware used, its tables also tend to have a random
speed of operation and the cost of the structure, which makes design,
design process itself. debugging and maintenance of the
We consider three simplified and circuit difficult.
systematic approaches to the design of
hardwired controllers.
State-table Method
It is standard algorithmic approach to
sequential circuit design.
The behavior required like control unit of
any finite-state sequential machine can be
represented by a state table as shown in
figure below.
Let Cin and Cout denote the input and out
variables of the control unit.
The rows of the state table set
ofinternal states {Si}of the machine.
The column of the state table set of 2.4.3 Delay-Element Method
external signals to the control unit
The entry in row Si and column 11 has Control unit using delays elements can be
the form Si,j, Zi,j constructed directly form a flowchart that
Where Si,j denotes the next state of the specifies the control-signal sequences
control unit and Zi,j denotes the output required. Consider the problem of
signals Zi,j from Cout that are activated by generating the following sequence of
the application of I to the control unit when control signals at times t1, t2…..tn using a
it is state Sj. hardwired control unit.
t1 : Activate {C1,j};
t2 : Activate {C2,j};
______________________________
tn : Activate {Cn,j};
18
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
activate {C2,j}. Similarly another delay Consider the circuit shown in figure. It
element of delay t1-t2 with input START (t2) consists basically of modulo-k counter
can be used to activate (C3,j) and so on. whose output is connected to 1/k
Thus control signals can be generated by clocked decoder. If the count enable
sing delay elements. input is connected to clock source, the
To ensure synchronous operation, the counter cycles continually through its k-
delay elements are implemented by D- states.
flip flop and controlled by a common The decoder generates k pulse signals {
clock signal. Since normally only one i } on its output lines.
flip flop is set or “hot” at any time and Consecutive pulses are separated by
all other flip flops are reset, this one clock period as shown in figure.
approach is also called “one hot”
The{ i } effectively divide the time
method.
required for one completer cycle by the
Disadvantages counter into k equal parts.
The number of delay elements needed Two additional input lines and flip-flops
is approximately equal to the number of are provided for turning the counter on
states and each delay elements is a and off. A pulse on the begin line causes
sequential circuit of equal or greater the counter to begin cycling through its
complexity than a flip-flop. state by logically connecting the count
The delay element approach produce enable line to the clock source.
expensive circuits in which timing is
controlled by pluses traveling through 2.5 MICROGRAM MED CONTROL
cascades of clocked delay elements.
Synchronization of many widely Microprogramming is a method of
distributed delay elements may also be control design in which the control
difficult. signal selection and sequencing
information is stored in a ROM or RAM
2.4.4 Sequence –Counter Method called a control memory (CM).
Each microinstruction also explicitly or
implicitly specifies the next
microinstruction to be used by
providing the necessary information
sequencing.
A set of related microinstructions is
called a micro program. In micro
programmed CPU, each machine
instruction is executed by a micro
program which acts as reat- time
interprets for the instructions.
19
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
(ii) Address field to activate next controlled by external signal. The two
microinstruction to be executed required to possible conditions are used to activate
activate control signal two separate lines which provides two
separate addresses. These addresses
can be used to provide conditional
branching in microinstructions.
Horizontal Microinstructions :
The Horizontal the existence of the
long control word that produces a
horizontal pattern of 1’s and 0’s.
Horizontal microinstructions are able to
control a variety of components
generating in parallel.
A horizontal microinstruction may
initiate simultaneous independent
The control memory is organized in microinstructions for many registers, for
matrix form i.e. Rows and columns. a memory read or memory write
Rows micro-instruction operation and for the generation of the
Column micro-step or address of next next address, all in the microinstruction.
instruction Advantage
Each row to be activated is decided by Efficient hardware utilization
decoder and at a time only one output Disadvantage
line will be active. Control memory becomes expensive.
The input to decoder is given by control
memory address register (CMAR) In a control word, the number of
CMAR contents the current control bits can be reduced by grouping
microinstruction address used to mutually exclusive variables into fields
generate microinstruction. and encoding the k-bits in each field to
The CMAR decides the address of provide 2k micro-operations.
microinstruction by using reference of
(i) External address source gives the 2.7.1 Vertical Microinstructions
starting address of microprogram
stored in the control memory. A microinstruction format which is not a
(ii) Address given by 3 column line. horizontal is called vertical
The first microinstruction is activated microinstruction.
which will provide micro steps and the It requires external decoding circuits
address of next microinstruction to be external to the control memory.
activated. This address is accepted by The term ‘vertical’ implies that the
CMAR and used to activate the next encoding of fields necessitates decoding
microinstruction. circuits that form a vertical pattern,
This scheme also provides the facility of which may consist of one or two levels
using externals or condition codes. This of decoding.
is provided by switches. The switch is
activated when a row is active and is 2.7.2 Difference between Hardwired
control and Micro-programmed control
20
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Hardwired control Micro programmed Control left end of A and M accommodates the
Speed Comparatively Comparatively sign bit during subtractions. Algorithm
fast slow
Control Implemented implemented in
System in hardware software
Flexibility More flexible, Not flexible to
to accommodate
accommodate new system
new system specification or
specifications new
or new instructions for
instructions that redesign is
required
Ability to Ability to handle
handle large complex
large/complex instruction sets
instruction sets is easier
somewhat
difficult
Ability to Ability to
support support
operating operating
systems and system and
diagnostic diagnostic
features are features are
very difficult easier
Design process Design process
is somewhat is orderly and
complicated systematic.
Applications Used in RISC Used in
microprocessor mainframes and
some
microprocessor
Instruction size Instruction size
usually under usually over 100
100 instruction.
instructions
Chip Area Uses less area Uses more area
Efficiency
21
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
3 MEMORY ORGANIZATION
22
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Consider a general n-level system of n
memory types (M1, M2….Mn). Fig (1) shows
some examples with n = 2, 3 and 4. Typical
technologies used in these hierarchies are
semiconductor SRAM’s for cache memory,
semiconductor DRAM’s for main memory
and magnetic disk units for secondary
memory. The two level hierarchy of fig (1)-
a is typical of early computer. Fig (1)-b
adds a cache of a type called split cache
since it has separate areas for storing
instructions (the I-cache) and data (the D-
cache). The third example Fig (1)-c has two During program execution the CPU
cache levels, both of the non-split or unified produces a steady stream of memory
type. Embedded microcontrollers also use addresses. At any time, these addresses are
the various hierarchical organizations but distributed in some fashion throughout the
often lack the secondary or the cache level. memory hierarchy. If an address is
The following relations normally hold generated that is currently assigned only to
between adjacent memory levels Mi, and Mi where i 1, the address must be
Mi+1 in a memory hierarchy. reassigned to M1, the level of the memory
hierarchy that the CPU can access directly.
Cost per bit C1 > Ci+1 This relocation of addresses involves the
Access time tAi < tAi+1 transfer of data between levels Mi and Mi-1,
Storage capacity S1 < Si+1 a relatively slow process. For a memory
hierarchy to work efficiently, the addresses
The difference in cost, access time and generated by the CPU should be found in
capacity between Mi and Mi+1 can be M1 as often as possible can be transferred
several orders of magnitude. Considerable to M1 before it is actually used by the CPU.
system resources are devoted to shielding If the desired data cannot be found in M1,
the CPU from these differences so it almost then the program originating the memory
always sees a very large and inexpensive request must be suspended until an
memory space and rarely see an access appropriate reallocation of storage is made.
time greater than that of M1, the first level Main and secondary memory form another
of the memory hierarchy. (figure 1(b) two level sub hierarchy. This
The CPU and other processors can interaction is managed by the operating
communicate directly with M1 only, M1 can system, however and so is not transparent
communicate with M2, and so on. to system software, although it is
Consequently, for the CPU to read somewhat transparent to the user code.
information held in some memory level Mi
requires a sequence of i data transfer of the Example
form, Consider a two level memory hierarchy m1
Mi1: Mi ;Mi2 : Mi1;Mi3: Mi 2 ;CPU: M1 and m2 and let C1 and C2 be the cost per
An exception is allowed in the case of byte, t1 and t2 be the access times and S1
caches, the CPU is designed to bypass the and S2 be the memory capacities for m1 and
cache levels and go directly to main m2 respectively.
memory. In general, all the information a) Under what conditions will the average
stored in Mi at any time is also stored in cost of the entire memory system
Mi+1, but not vice versa. approach C2.
23
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
b) what is the effective memory access In figure below, E is plotted as a
time ta of this hierarchy. function of H. This graph shows the
c) Express access efficiency E in terms of importance of achieving high values of
speed ratio and hit ratio. H in order to make E 1, i.e., ta t1.
d) Plot E against H for r = 520 and 100
respectively and comment on this
performance.
We consider a two level memory hierarchy
(m1, m2) The average cost per bit of
memory is given by:
C S C2S2
C 1 1
S1 S2
24
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
C1S1 C2S2 storage locations can be accessed only in a
C=
S1 S2 certain predetermined sequence.
0.1000 1024 0.0100 2
= 4) Alterability : Memories whose contents
210 216 cannot be altered on line are called Read
210 (0.1 0.01 26 ) Only Memories (ROMs). Memories in which
=
210 (1 26 ) reading or writing can be done online are
0.74 called read write memories. All memories
= = 0.01138 used for temporary purpose are read write
65
Avg. cost/bit. memories.
ta = Ht1 + (1-H) t2 = 0.9 10-8 + 0.0001 10-6
= 10-6 [0.009 + 0.0001] = 0.0091 10-6 sec. 5) Permanence of Storage : The physical
processes involved in storage are
3.3 MEMORY CHARACTERISTICS sometimes inherently unstable, so that
stored information may be lost over a
The properties to be considered when period of time unless appropriate action is
evaluating any memory technology are: taken. There are three important memory
characteristics that can destroy
1) Cost : The price should include the cost
information: destructive readout, dynamic
of information storage cells as well as the
storage and volatility. In destructive
cost of the peripheral equipment or access
readout, the memory contents are
circuitry essential for the operation of
destroyed (erased) as the memory is read.
memory.
Memories which require periodic
cost = price of complete memory
refreshing are called as dynamic memories.
system/total bits of storage capacity.
Static memories do not require refreshing.
If the contents of memory are lost in case of
2) Access Time : It is the time required to
power failure, the memory is termed as
read or write a fixed amount of
volatile memory.
information, e.g. one word from the
6) Cycle Time and Data Transfer Rate :
memory. Access time depends upon the
The minimum time that must elapse
physical characteristics of the storage
between the initiations of two different
medium and also on the types of access
memory accesses can be greater than
mechanism used. It is usually calculated
access time, this loosely defined term is
from the time a read request is received by
called cycle time of the memory. It is
the memory unit to the time a read request
generally convenient to assume that cycle
is made available to the memory output
time is the time needed to complete any
terminals. The access time measured in
read or write operation in memory.
words per second is another widely used
The maximum amount of data that can be
performance measure for storage devices.
transferred is 1/tm and is called data
Thus, low cost and high access rates are
transfer rate. The access time may be more
desirable memory characteristics.
important in measuring overall computer
system performance since it determines
3) Access Modes : It is the order or
the length of time of processor must wait
sequence in which information can be
unit initiating a next memory request.
accessed. Memory can be accessed
7) Physical Characteristics : Many
randomly or sequentially. In random access
different physical properties of matter are
memories each storage location can be
used for information storage. The most
accessed independently of the other
important properties used for this purpose,
locations whereas in serial access memory
are classified as electronic, magnetic,
25
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
mechanical and optical. A factor information and store it in cells of
determining the physical size of a memory the selected word.
unit is the storage density measured in bits The organization shown above in the
per unit area. In general, memories with no fig. 2 is an example of a very small
moving parts have much higher reliability memory chip consisting of 16 wards of
than memories such as magnetic disks 8 bits each, referred as a 16*8
which involves considerable mechanical organization.
motion. The data input and the data output
of each sense/write circuit is
3.4 SEMICONCUCTOR RAM MEMORIES connected to a single bidirectional
data line.
Semiconductor memories are available in a Two control lines, R/W and CS, are
wide range of speeds. Their cycle times provided in addition to address and
range from 100 ns to less than 10 ns. data lines.
The R/W input specifies the
3.4.1 Internal Organization of Memory
required operation, and the CS input
Chips
selects a given chip in a multi chip
Consider the following memory memory system.
organization, which is organized in the The circuit shown in fig.2 above stores
form of an array and each cell is capable 128 bits and requires 14 external
of storing one bit of information. connections for address, data and
control lines.
Semiconductor memories may be
divided into bipolar and MOS (metal-
oxide semiconductor) types.
Semiconductor RAM : In
semiconductor memories, the basic
storage cells are transistor circuits. The
Semiconductor memories fall into two
main categories, static and dynamic.
Static RAM : These RAMs are
Each row forms a memory word, composed of memory cells that
and all cells of a row are connected resemble the flip-flop used in processor
to a common line called the word registers. In a dynamic RAM cell, the 1
line, which is driven by the address and 0 states correspond to the presence
decodes on the chip. or absence of a stored charge in a
In each column, the cells are capacitor controlled by a transistor
connected to a sense/write circuit switching circuit. Since a dynamic RAM
by two bit lines. The sense/write cell can be constructed around a single
circuit are connected to the data transistor, where as a static cell
input/output lines of the chip. requires upto six transistors, higher
During read operation ⟹ these storage density is achieved with
circuits sense, or read the information dynamic RAM design. Consequently,
stored in the cells selected by a dynamic RAMs are more difficult to use
word line and transmit this than static RAMs. Unlike the ferrite
information to the output data lines. cores, semiconductor memories both
During write operation ⟹ the static and dynamic, are volatile so that
sense/write circuit receives input the stored information is lost when the
power source is removed.
26
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
3.4.2 Random Access Memory (RAM) Various features of RAM organization are:
The storage cells are physically
arranged as rectangular arrays of cells.
The memory address is punched into
the components, so that, the address Ai
of cell Ci becomes a d-dimension vector
(A1,1, A1,2…….A 1.d) = A .
Example :
Design a 4M 16 memory unit using 256K
1 memory chips. Explain in detail the
assumptions made while designing the
system.
27
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Solution substantially less access circuitry than
the 1-D for a fixed amount of storage.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
sense circuit at the end of the bit line Disadvantage: If requires different
generates the power output value. voltages for erasing, writing and
2. PROM reading the stored data.
Some designs allow the data to be
loaded by the user by providing a 3.4.4 CACHE MEMORY ARCHITECTURE
programmable ROM (PROM). This is AND WORKING
achieved by inserting a fuse at point P
in Fig. 9. Cache memory is positioned logically
Before programming, the memory between the CPU and main memory.
contains all 0’s. The programmer can A cache’s storage capacity is less than that
insert 1’s at the required locations by of main memory, but an access time of 1 to
burning out the fuses at these locations 3 cycles. Hence, cache is much faster than
using high current pulses and this main memory because the some or all of it
process is irreversible. can reside on the same IC as CPU, as cache
It is more flexible and convenient than is small in size.
ROMs. Because they can be Cache are essential components of high
programmed directly by the user they performance computers that aim to make
are also more beneficial cost wise when CPU wait time 1 compared to other 3
required in less number as compare to memories, cache is transparent to user.
ROMs.
3.4.4.1 Memory Hierarchy
3. EPROM
It is an erasable, reprogrammable ROM
i.e. we can erase stored data and new
data can be loaded. It provides
flexibility while designing digital
system.
The structure of an EPROM cell is
similar to the ROM as shown in Fig.9,
but in an EPROM cell, the connection to
the ground is always made at point P 3.4.4.2 Duplication:
and a special transistor is used, which
has the ability to function either as a In the memory hierarchy, the duplication of
normal transistor or as a disable whatever is in the lower level is always
transistor that is always turned off. present in higher level i.e., whatever
Advantage : EPROM contents can be present in main memory is always in cache
erased and reprogrammed memory.
Disadvantage : A chip must be For the user, the secondary memory acts as
physically removed from the circuit for memory but for CPU it will act as I/O
reprogramming and that its entire device.
contents are erased by the ultraviolet
light. 3.4.4.3 Cache Organization
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
to as a ‘tag’, so cache knows to what But if match of Ai is not found in the
pair of memory space the block belongs. cache, then cache miss occurs and then
Ai is matched with main memory M2
address.
In response to cache miss, the block Bi
of address Aj is transferred from the M2
to M1, i.e, copied from main to cache.
The tag addresses contain, (that are
currently assigned to cache which can (ii) Look through Cache
be non-continuous is stored in a special In this method, the CPU communicates
memory) the ‘Cache tag memory’ or with the cache memory via a separate
directory. bus that is isolated from main system
bus. With a look through cache the CPU
Example : does not automatically send all requests
If Bj is block containing Dj data in M1. to main memory. It is possible only
Then, Bj is in cache tag memory and Dj when cache miss occurs.
is in cache data memory.
To improve the performance of the
computer, the cache memory is used.
Hence, the access time of cache should
greater than main memory. Therefore
of main memory is implemented with
DRAM technology having on access time
tA1 = 50 ns, then cache might be
implemented with an SRAM technology
having an access time tA2 = 10 ns.
30
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
For execution of instructions, update logic interprets these s bits as a tag of s-
each time upto cache level. r bits (most significant)
But, when write instruction executes A line field of r bits.
only then the main memory is updated
by copying the cache block to main Address length (s + w) bits
memory and transferring main memory No. of addressable units 2s+w words or
bytes
block to cache memory.
Block size = line size 2w words or
bytes
The write back policy can be overcome No. of blocks in main 2s+w/2w
by a ‘dirty bit’ or ‘modified bit’. memory
If dirty bit = 1, then required block is No. of lines in cache = m 2r
present in memory and write that block Size of tag (s—r) bits
to memory. If dirty bit = 0, the block is
clear and no need to copy in main With associative mapping, there is
memory. flexibility as to which block to replace
when a new block is read into the cache
3.4.6 MAPPING METHODS Disadvantage
Associative mapping requires complex
Mapping is needed between main memory circuitry to examine the tags of all cache
blocks and cache lines since there are fewer lines in parallel.
cache lines than main memory blocks. Also,
a means is needed for determining which (ii) Associative Mapping
main memory block currently occupies a It overcomes the disadvantage of direct
cache line. mapping by permitting each main
Three techniques can be used: memory block to be loaded into any line
(i) Direct of the cache.
(ii) Associative The cache control logic interprets the
(iii) Set associative memory address simply as a tag and
word field. The tag field uniquely
(i) Direct Mapping identifies a block of main memory.
It maps each block of main memory into
only one possible cache line, and Note : To determine whether a block is in
mapping is expressed as the cache, the cache control logic must
i = j modulo m simultaneously examine every line’s tag for
where, a match. No field in the address
i = cache line number corresponds to line number, so that the
j = main memory block number number of lines in the cache is not
m = number of lines in the cache determined by the address format.
The mapping function is easily Address length (s + w) bits
No. of addressable units 2s+w words or bytes
implemented using the address. For
Block size = line size 2w words or bytes
cache access, each main memory No. of blocks in main memory 2s+w/2w = 2s
address can be viewed as consisting of No. of lines in cache = m undetermined
three fields. Size of tag s bits
The least significant w bits identify a
unique word or byte within a block of With associative mapping, there is
main memory. flexibility as to which block to replace
The remaining s bits specify one of the when a new block is read into the cache.
2s blocks of main memory. The cache
31
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Disadvantage It significantly improves the hit ratio
Associative mapping requires complex over direct mapping. Four-way set
circuitry to examine the tags of all cache associative (v = m/4, k = 4) makes a
lines in parallel. modest additional improvement for
a relatively small additional cost.
(iii) Set Associative Mapping Further increase in the number of
It is a compromise that exhibits the lines per set has little effect.
strengths of both the direct and
associative approaches while reducing There are possible three mapping
their disadvantages. methods to specify where main
In this, the cache is divided into v sets, memory are placed in cache:
each of which consists of k lines. Direct mapping method
m = v*k Associative mapping method
Where, i = j modulo v Block-set associative mapping method
j = main memory block number To discuss mapping methods, consider
m = number of lines in the cache a cache consisting of 128 blocks of 16
This is referred to as k-way set words each for a total of 2048 (2K)
associative mapping. words and assume that the main
With set associative mapping, block Bj memory is addressable by a 16-bit
can be mapped into any of the lines of address for mapping purpose, the main
set i. In this case, the cache control logic memory will be viewed as composed of
interprets a memory address simply as 4k blocks.
three fields tag, set, word.
The d set bits specify one of v = 2d sets. 3.4.6.1 Direct Mapping
The a bits of the tag and set fields
specify one of the 2s blocks of main
memory. With k-way set associative
mapping, the tag in a memory address
is much smaller and is only compared
to the k tags within a single set.
32
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
determines the cache position in which 3.4.6.3Set-Associative Mapping Technique
this block must be stored. The high
order five bits of the main memory This is a combination of the two
address of the block are stored in five techniques, direct mapping and
tag bits associated with its location in associative-mapping. Here blocks of the
the cache. cache all grouped into sets and mapping
As execution process’s the 7-bit cache allows a block of main memory to
block field of each address generated by reside in any block of a specific set.
the CPU points to a particular block
location in the cache.
The tag field of that block is compared
to the tag field of the address if they
match, then the desired word is in that
block of the cache.
If there is no match, then the block
containing the required word must first
be read from the main memory and
loaded into the cache. The direct
mapping method is easy to implement,
but it is not very flexible.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
referenced but determining which
blocks are about to be referenced is If the main memory of a computer is
difficult, Since programs usually resides structured as a collection of physically
in localized areas for reasonable separate module, each with its own
periods of time, there is a high address buffer register (ABR) and data
probability that blocks that have been buffer register (DBR), then memory
referenced recently will be referenced access operations may proceed in more
again. than one-module at the same time.
When a block is to be overwritten, it is
easy to overwrite the one that has gone
the longest time without being
referenced. This block is called the least
recently used (LRU) block and the
technique is called the LRU replacement
algorithm.
To perform LRU function, the cache
controller must record references to all
blocks as computation progresses.
Time type of memory is known as
For Example : multiple-module memory. Here, the
It is required to record the LRU blocks of average rate of transmission of words
four-blocks set in a set-associative cache. A to end from the total main memory
2-bit counter can be used for each block. system can by increased. In multiple
When a hit occurs, the counter of the block module memory the modules can be
that is referenced is set to 0. Counters with addressed in two ways.
values originally lower than the referenced In first method, the main memory
one are incremented by one, and all others address generated by the CPU is
remain unchanged. When a miss occurs and decoded as shown in Fig.
the set is not full, the counter associated
with the new block loaded from the main
memory is set to 0, and the values of the
other counters are increased by one. When
a miss occurs and the set is full, the block
with the counter value 3 is removed, the
new block is put in its place, and its counter
is set to 0. The other three block counters
are incremented by one. It can be easily
verified that the counter values of occupied
blocks are always distinct In figure 16, high order K-bits name one
of n modules and low order m-bits
The LRU algorithm has been used
name a particular word in that module
extensively. Although it performs well
of the CPU issues Read requests to
for many access patterns, it can lead to
consecutive location, as it does when
poor performance in some cases.
fetching instructions of a state line
Performance of the LRU algorithm can
program, then only one module is kept
be improved by introducing a small
busy by one CPU. However, the devices
amount of randomness in deciding
with direct memory access (DMA)
which block to replace.
ability may be accessing information in
other memory modules.
3.4.7 Memory Interleaving
34
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
The second and more effective method given to IC and decoder, this interleaved
to address the module is shown in Fig. memory is divided into two types.
17. In this, the low order K-bits of the
main memory address select a module
and high order m-bits name a location
within that module. In this way,
consecutive addresses are located in
successive modules. This is called
memory interleaving.
In this, any component of the system
that, generates requests for access to
consecutive main memory locations can
keep a number of modules busy at any
given time. This results in a higher
average utilization of the memory
system as a cache.
To take advantage of memory (i) Low-order Interleaving Memory
interleaving, the CPU or the DMA device
must be capable of initiating a memory In this memory, low order of odd lines are
access operation while waiting for a given to 8K RAM IC while CPU gives three
previous memory access to be higher order address lines to ports of
completed. decoder which selects any of the RAM.
In memory interleaving there must be When A13 A14 A15 000 then 1st chip IC is
2k modules; otherwise, there will be selected and memory location in that
gaps of non-existent locations in the particular IC is selected by A0 = A12 lines.
main memory address space. In this
first method, an existing system can be (ii) Higher-Order Interleaved Memory
expanded simply by adding one o more
modules as required. In this memory system, the higher order
But in second method, the system must address lines i.e., A3- A15 are given to IC and
always have the full set of 2k modules low order address lines are given to
and a failure in any module effects all decoder from CPU to select IC.
areas of the address space. A failed
module in the first system affects only a
localized area of the address space.
35
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
When a program does not completely fit called pages. Each page consists of a
into the main memory, the parts of its block of word that occupies continuous
not currently in the main memory or locations in the main memory. Page
the parts of it not currently being length is normally 2K to 10K bytes.
executed are stored in secondary Virtual memory address translation is
storage devices. The techniques that based on fixed length pages. Each
automatically get more program and virtual address is generated by the
data blocks into the physical main processor. It contains the virtual page
memory when they are required for number of offset. Information about the
execution are called virtual memory main memory location of each page is
techniques. The binary addresses that kept in a page table. An area in the main
the processor issues for either memory that can hold one page is called
instruction or data are called virtual or a page-frame. The starting address of
logical addresses. These addresses are the page table is kept in a page table
translated into physical addresses by a base register. By adding the virtual page
combination of hardware and software number to the content of this register,
components. the address of the corresponding entry
Figure 18 shows the typical in the page table is obtained. Each entry
organization that implements virtual in the page table also includes some
memory. Memory management unit control bits that describes the status of
(MMU) is a hardware device that page while it is in main memory. The
translates virtual addresses into page table information is used by the
physical addresses. MMU for every read and write access. A
small cache, usually called the
Translation Look-aside Buffer (TLB) in
incorporated into the MMU. It consists
of the page table entries that
correspond to the most recently
accessed pages.
36
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Virtual memory is commonly Translation of physical space is
implemented by demand paging. It can p.n
also be implemented in a segmentation
system with several systems provides a
paged segmentation scheme, where page address displacement
segments are broken into pages.
Demand segmentation can also be used The memory map, now referred to as a
to provide virtual memory. ‘page table’ consists of the following
information.
3.7 Paging, Segmentation and Paged Page Page Presence change Access
Segments add frame bit p bit C
A 000000 1 0 R 1X
Main and the secondary memory form C 06C7F9 0 1 R1W1X
another two level hierarchy. This
interaction is managed by operating Each virtual page address has
system. However, it is not transparent to corresponding real address of a page
system software but somewhat transparent frame in main or secondary memory.
to the user code. The term ‘virtual memory’ When presence bit p = 1, required page
is applied when main and a secondary in main memory and base address of
memory appears in user program like a page frame is stored in page table. If p =
single, large and directly addressing 0, a page fault occurs. The change bit C
memory. indicates whether or not the page has
Three reasons for using virtual been changed since it was last loaded
memory: into main memory.
To free user from the need to carry out Page table can also contain memory
storage reallocation and permit the protection data specifies access rights
efficient sharing of available memory of current program to read from, writer
space by the different users. into or execute page.
To make the program independent of Since, page frames are contiguous, no
the configuration and capacity of the external fragmentation exists in paging.
physical memory for execution. But, if K-word block is divided into P, n-
To achieve the very low cost per bit and word pages and K is not multiple of n,
low access time that are possible with the page frame to which the block is
memory hierarchy. assigned will not be filled. Unusable
The program is divided into number of space within the partially filled page
blocks of virtual memory which is frame is ‘internal fragmentation’.
known as ‘virtual address space’.
Advantages
A page is a fixed length block which can
The chief advantage of paging is that
be assigned to fixed regions of physical
data transfer between memory levels is
memory called as ‘page frames’.
simplified an incoming page can be
Division of physical memory space into
assigned to any available page frame.
equal size blocks is called ‘page frame’.
Size of page frame is equal to size of No external fragmentation problem, as
page. Dividing the virtual address space page frames are contiguous.
into equal sized blocks in known as Paging is hidden from the user.
‘Paging’. This can be used in multiprogramming.
In pure paging system, each virtual Disadvantages
address consists of two parts: a page Protection facility is not available in
address (no) and a displacement. paging.
37
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Internal fragmentation is problem in length pages is shown in Fig. 19. Each
paging. virtual address generated by the
processor, whether it is for an
3.7.1 Process of address translation in instruction fetch or an operand
a Virtual fetch/store operation is interpreted as
virtual page number (high order bits)
Memory System within a page. Information about the
A simplest method for translating a main memory location of each page is
virtual address into a physical address kept in a page table. This information
is to assume that all programs and data includes the main memory address
are compared of fixed-length units where the page is stored and the
called pages, each of which consists of a current status of the page. An area in
block of words that occupy contiguous the main memory that can hold one
locations in the main memory. page is called a page frame.
Pages commonly range from 2K to The starting address of the page table is
16K bytes in length. They constitute kept in a page table base register. By
the basic unit of information that is adding the virtual page number to the
moved between the main memory and contents of this register, the address of
the disk whenever the translation the corresponding entry in the page
mechanism determines that a move is table is obtained. The contents of this
required. Pages should not be too small, location give the starting address of the
because the access time of a magnetic page if that page currently resides in
disk is much longer (10 or 20 the main memory.
millisecond) than the access time of the Each entry in the page table also
main memory. includes some control bits that describe
The reasons for this is, that it takes a the status of the page while it is in the
considerable amount of time to locate main memory. One bit indicate the
the data on the disk, but once located, validity of the page, i.e., whether the
the data can be transferred at a rate of page is actually loaded in the main
several megabytes per second. memory. This bit allows the operating
On the other hand, if pages are too system to invalidate the page without
large, it is possible that a substantial actually removing it. Another bit
portion of a page may not be used, yet indicates whether the page has been
this unnecessary data will occupy modified during its residency in the
valuable space in the main memory. memory. Other control bits indicate
various restrictions that may be
imposed on accessing the page.
Example
A virtual memory system has a 16K word
logical address space, 8k word physical
address space with page size of 2k word.
The page address trace of a program has
been found to be:
7 5 3 2 1 0 4 1 6 7 4 2 0 1 3 5
List the four pages resident in the memory
after each page reference for the following
A virtual memory address translation replacement policies:
method based on the concept of fixed (i) FIFO (ii) LRU
38
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Solution
7 5 3 2 1 0 4 1 6 7 4 2 0 1 3 5
(i) FIFO
7 7 7 7 1 1 1 1 1 1 1 2 2 2 2 5
*
5 5 5 5 0 0 0 0 7 7 7 7 1 1 1
3 3 3 3 4 4 4 4 4 4 4 4 4 3
*
2 2 2 2 2 6 6 6 6 0 0 0 0
The O/S maintains a page table for each
(ii) LRU process. The page table shows the
frame locations for each page of the
7 7 7 7 1 1 1 1* 1 1 2 2 2 2 5 process. Each logical address consists of
5 5 5 5 0 0 0 0 7 7 7 1 1 1 a page number and a relative address
3 3 3 3 4 4 4 4* 4 4 4 3 3 within the page. In paging, the logical to
2 2 2 2 2 6 6 6 0 0 0 0
physical address translation is done by
CPU hardware.
3.7.2 Implementation Methods
Now the CPU must know how to access
the page table of the current process.
i. Paged Memory System
Presented with a logical address
ii. Demand Paged Memory System
consisting of page number, relative
address. The CPU uses the page table to
i. Paged Memory System
produce a physical address consisting
Paging system uses fixed length blocks
of frame number and relative address
called pages and assign them to fixed
as shown in figure below.
regions of physical memory called page-
frames. The main advantage of paging is
that memory allocation is greatly
simplified since an incoming page can
be assigned to any available page frame
Physical memory is broken into fixed-
size blocks called frames.
Logical memory is also broken into
blocks of the same size called pages.
When a program is to be executed, its
pages are loaded into any available
frames and the page table is defined to
map user pages to memory frames.
In the Fig. 20(a), some of the frames in Hence, paging overcomes a lot of
memory are in use and some are free. problems. Main memory is divided into
The lists of free frames are maintained many small equal size frames. Each
by the operating system. Process A, process is divided into frame sized
stored on disk, consists of four pages. pages. Smaller process requires lesser
When it comes to load this process the number of pages, large process requires
O/S finds four free frames and loads the more. When a process is brought in, its
four pages of the process A into the four pages are loaded into available frames
frames. and a page table is set up.
39
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
ii. Demand Paging If the process tries to use a page that
A demand paging system is similar to a was not brought into memory. Access to
paging system with swapping which is a page marked invalid causes a page
shown in figure. fault swap. The paging hardware will
notice that the invalid bit is set, causing
a trap to the operating system. This trap
is the result of the operating system’s
failure to bring the desired page into
memory.
40
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
beginning, to read a physical record, helps to transfer information between
through n-1, one at a time. If the main memory and the disk.
information desired is real the end of
the tape the program will have to read
almost the entire tape which may take
several minutes, forcing a CPU that can
execute millions of instructions per
second, to wait 200 sec while a tape is
advanced, is wasteful. Tapes are most
appropriate when the data must be
accessed sequentially.
41
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
With two sets of heads, a given sector disk has been written, it cannot be
will always appear under one set of erased as a magnetic disk can be.
heads or either within at most one-half
of the rotation period. Fixed head disks Example
are logically smaller to drums in that A Winchester magnetic disk unit has
the heads do not move but have the densities 40 106 bits per square inch of
physical appearance of a disk. surface.
i) If the inner diameter of recording is 4
inches and the outer diameter is 7
inches.
What is the average bit density along a
track if radial track spacing density is
2000 t racks/inch.
ii) What is the data transfer rate in
bytes/sec at a rotational speed of 3600
rpm?
Solution
Given :
Number of bits per square inch (density) =
40 106
Fixed disks are often combined with
Inner diameter= 4 inches
removable ones, the fixed disk for
Outer diameter = 7 inches
normal use and the removable one for
The total recording area= area (outer
making backups. Drums are never
circle) - area (inner circle)
combined with anything else in one
= /4 (72-42)
device.
= 25.90 sq.inches.
Total number of bits = bits
3.8.4 Optical Memories
density*recording area
= 40 106 *25.90
Optical memories have become
= 1036*106 bits
available. They have much higher
Track density = 2000 tracks/inch
recording densities than conventional
Total number of tracks = 2000 (outer
magnetic media.
diameter-inner diameter)
For example : A strip of ordinary 35
= 2000 (7-4) = 6000.
mm black and white film 3 feet long can
Average bit density along a track = total
hold more information than a 2400 ft
number of bits/ total number of tracks
magnetic tape.
= (1036 106) /6000
An especially interesting optical = 173 103 bits per track
memory is the video disk. Although Rotational speed= 3600 rpm = 60 rp/sec
these disks were originally developed Data transfer rate = 1056*106 bits/60
for recording television programs, they = 17.26*106 bits/sec.
can be put to more esthetic use as = 2.15 106 bytes/sec.
compute storage devices. = 2.15 MB/sec.
The disks are inherently digital, with Ans:
the information recorded as a i) Average bit density along a track= 173
sequences of bits burned into the 103 bits/track.
surface by an electron beam or laser ii) Data transfer rate= 2.15 MB/sec.
one characteristic, however, that limits
their application is that once a video Example
42
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
A high speed tape system accommodates
2400 ft reel of standard of 9-track tape. The
tape is moved fast the recording heat at the
rate of 150 inches/sec.
i) What must be the linear tape recording
density in order to achieve a data
transfer of 106bit/sec.
ii) If tape is organized into blocks of 32KB,
a gap of 0.4 inches separates the blocks,
what is the storage capacity of the tape.
Solution
Given : Data transfer rate = 106 bit/sec.
Tape speed (in inches/second)
transfer rate (bytes / sec)
=
Re cording density (bps)
= 150 inches/ sec
106 bits / sec
Re cording density bits / inch
Recording density (bpi)
6
10 bits / sec
150 inches / sec
Recording density
106
= = 6667 bpi
150
Recording density= 6667 bpi.
Total storage length= total length – length
wasted in gaps.
= 2400 (12-0.4)
= 27840 inches
Storage capacity = storage
length*recording density
= (27840 *6667/8 bytes) = 23.2 MB.
43
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4 INPUT AND OUTPUT UNIT
44
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
An IO device and a memory location can instructions which are used to initiate and
have the same address bit pattern terminate the execution of IO program. It is
without conflict. used to test the status of IO device. IO
I/O mapped I/O defines separate I/O processor initialized to executes IO data
address space and memory address transfer operations. IPO like 8089 provided
space. with two DMA channels which are used for
It uses separate control signals for data transfer operation using memory bus
memory and I/O devices. These are – when CPU does not require it.
memory read, memory write, I/O-read, The algorithm is as follows:
I/O-write.
It uses dedicated instructions for I/O WAIT :
operations. e.g. : IN, OUT I/O data If Attention = 1 then begin Fetch
transfer is always with respect to parameters from IOCR.
accumulator (a register) only.
Address memory area is not reduced in SET UP :
this case. Setup DMA control register. Begin IO
ALU operations cannot be directly program execution sends command to IO
performed on port data. device
Advantages
1. Data transfer controls by software.
2. I/O device does not have direct access to
memory.
Disadvantages:
1. This method is useful in small-low speed
computers only.
2. The CPU is wasting time while checking
the flag.
45
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4.3 DIRECT MEMORY ACCESS
The hardware required to design Direct
Memory Access is as shown below:
46
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4.3.1 DMA TRANSFER BLOCK activate the DMA
ACKNOWLEDGEMENT signal.
In DMA mode, DMA controller is the master 3. The DMA controller now transfers data,
and controls the memory bus. This mode is to or from the main memory. After a
needed by the secondary memories like word is transferred it updates DC and
disk drives, that have data transmission IOAR registers.
and are not to be stopped or slowed 4. If DC is not yet zero and I/O device is not
without any loss of data and transfer of ready to send or receive the data, then
blocks. Block DMA transfer, supports faster the DMA controller release the system
I/O data transfer rates, but the CPU bus to CPU by deactivating the
remains inactive for relatively long period REQUEST line CPU responds to the DMA
by teeing up the system bus. controller by deactivating the DMA
acknowledgement line.
4.3.2 CYCLE STEALING 5. If DC reaches to zero, then DMA
This is an alternative method for DMA controller should stop the transfer and
block transfer. In this method, system send interrupt request signal to CPU;
allows DMA controller to use system CPU responds by halting the I/O device
bus to transfer one word, after which it or by initiating a DMA transfer
should return back control of bus to 4.3.4 NEED OF DMA
CPU.
Block of I/O data transferred using A modest increase in hardware enables an
cycle stealing method have the DMA I/O device to transfer block of information
controller bus transactions, inter to or form ‘M’ (memory) without CPU
related with CPU bus transactions. intervention. For enabling this task I/O
This method reduces maximum I/O device should have to generate memory
transfer rates. If also reduces the address and transfer data to or from the
interference of DMA controller in CPU bus CPU initiating each block transfer.
memory access. Hence I/O device should require an
It is completely eliminated by designing interface between I/O data and main
DMA interface, so that the system bus memory that can carry out transfer without
cycles are stolen only when the CPU is program execution of CPU. Such I/O device
not actually using system bus. This is interface circuit is called DMA controller
called as “Transparent DMA” and level of I/O channel is called Direct
Memory Access (DMA) without CPU
4.3.3 STEPS INVOLVED IN DMA intervention.
SYSTEMS ARE
4.3.5 WORKING OF DMA & ITS
1. The CPU executes two I/O instructions, BENEFITS
which load the DMA register IOAR and
DR with their initial values. The IOAR is DMA comes into action due to following
loaded with base address of the region drawbacks of interrupt driven I/O and
of memory used for transfer. The DC simple programmed I/O.
will contain number of words to be
transferred to or from the memory. Drawbacks
2. When a DMA controller is ready to 1. Data Transfer must traverse a path
transmit or receive the data, it will through a CPU.
activate the DMA request interrupt 2. I/O transfer rate is limited by the speed
signal to the CPU. The CPU will wait for with which the CPU can test and service
the next DMA breakpoint, then it will a device
47
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
3. CPU is tied up in managing and I/O 4.3.8 DMA Data Transfer
transfer, a number of instructions must
be executed for each I/O transfer.
When large volume of data are to be
moved, a more efficient technique is
required i.e. nothing but DMA.
4.3.6 DMA FUNCTION
DMA has additional module on the system
bus called “DMA module”, which is capable
of taking over control of the system bus
from the CPU. It is the data transfer technique, directly
between memory and I/O device
4.3.7 Working without CPU intervention.
1. When CPU reads or writes a block of Data is directly transferred between
data, it issues command to DMA memory and I/O devices under the
module, which includes the following supervision of extra hardware called as
information: DMA controller (like 8237).
a) Whether a read or write is It is the fastest type of data transfer
requested technique among this parallel group.
b) The address of the I/O device Initialization of the DMA controller is
involved done by loading memory address and
c) The starting location in memory to word count into the channel register of
read from or write to. the DMA controller to which the I/O
d) The number of words to be read or device is connected.
written. The I/O device initiates DMA operation
2. CPU after sending above information by DMA request (DRQ) to the DMA
continues with its work. controller.
3. DMA module transfers the entire block
of data, one word at a time, directly to 4.4 STEPS INVOLVED IN THE DMA
or from memory, without going through OPERATION:
the CPU. 1. I/O device asserts DRQ signal.
4. When transfer is complete, the DMA 2. DMA controller sends HOLD signal to
module sends an interrupt signal to the the microprocessor.
CPU. 3. p sends HLDA (acknowledgement
Thus, the CPU is involved only at the signal) back to DMA controller and DMA
beginning and end of the transfer. controller takes charge of the system
bus.
4. DMA controller give DMA acknowledge
(back) signal to the corresponding I/O
device.
5. Now, the DMA controller places memory
address on the address bus. It reads the
data bytes from the memory and
transfers it to I/O device.
6. DMA controller updates memory
address register and word county
register.
48
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
7. When the internal count become zero
the DMA controller now sets HOLD =-(it
is indication to p that DMA operation
is over).
8. Now the processor regains the charge of
the system bus which continues with
the normal operation.
There are various types of DMA transfer
modes as follows:
1. Byte/Cycle stealing mode.
2. Burst/Demand mode.
3. Continuous/Block mode.
Example:
4.4.1 BYTE/CYCLE STEALING MODE File transfer between memory & printer. The
printer prints the buffer contents by its own
When the DMA controller takes charge of speed. Say, if the data file to be printed is 10
the system bus, 1 byte is transferred Kbytes and the internal printer buffer is 4
between memory and I/O device and Kbytes then demand mode DMA is
subsequently the system bus is given back to performed in 3 bursts (4 Kbytes, 4 Kbytes &
the microprocessor. This is possible by 2 Kbytes).
stealing CPU cycle when processor is not
using the system bus. 4.4.3 CONTINUOUS/BLOCK MODE DMA
This is also called Hidden DMA.
The flowchart for byte mode DMA is as This DMA mode is used for the transfer
follows: between memory and fast I/O devices. For
Byte mode DMA exhibits high system gaining control of the system bus the entire
performance by carrying two data block is transferred between memory
operations simultaneously. and fast I/O device. Once the entire block is
over the control of the system bus is given
back to P.
49
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
In this method, CPU stays in a program A DMA controller takes control over the
loop until the device indicates that it is bus to manage the transfer directly
ready for transfer of data. between the I/O device and memory.
It is time consuming process and keeps There are various transfer modes
the processor busy needlessly. possible
This problem can be avoided using an
interrupt facility and special commands
to inform the interface to issue an
interrupt request signal, when the data
are available from the device.
Transfer of data under programmed I/O
is between CPU and peripheral.
CPU initiates the transfer by swapping
the interface with the starting address.
i) Burst transfer mode:
In this, a block sequence consisting of a
number of memory words is transferred in
a continuous burst.
ii) Cycle stealing mode:
It allows the DMA controller to transfer one
data word at a time after which it must
return control of the bus to the CPU.
4.6 DATA TRANSFER TECHNIQUES
4.5 INTERRUPT-INITIATED I/O Data transfer techniques are classified
Interrupt-Initiated I/O transfer mode uses among two broad categories
the interrupt facility. 1) Intra System (Implemented using
In this transfer mode, the CPU parallel I/O).
constantly monitors the flag and inform 2) Intersystem (Implemented using
the computer when it is ready for serial I/O).
transfer. Parallel I/O can be further classified
In this transfer mode, CPU responds to into three categories:
the interrupt signals by storing the 1) Programmed/Polled I/O
return address from the program 2) Interrupt driven I/O
counter into a memory stack and then 3) DMA
control branches to a service routine Similarly serial I/O can be classified in
that processes the required I/O two groups as per data format which is
transfer. used.
There are two methods of choosing the 1) Asynchronous serial
branch address of the service routine 2) Synchronous serial
are victor interrupt and non-vector
interrupt. 4.6.1 PARALLEL DATA TRANSFER
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
It is used for short distance P time is wasted in polling the I/O
communication generally between device ready status. Generally I/O
various components in a serial devices are slower than memory or
computer system. Hence it is called processor so most of the processor
Intra System Communication. time is wasted in polling I/O
It uses parallel data bus for devices.
communication usually 8/16/32 bits Microprocessor is involved in the
depending on the p which is used operation.
In this data transfer takes place on 4.6.3 INTERRUPT DRIVEN DATA
byte/word/double word at a time TRANSFER
through parallel data bus.
Parallel bus connects all components in In this technique processor does not
a single computer system. check the ready status of that I/O
The problem of cross talk/interference device.
arises as the distance between device I/O device itself sends interrupt signal
increases. (INT) to the microprocessor, when it
The cost of parallel bus also increases become ready for data transfer.
as the distance increases.
4.6.2 PROGRAMMED/POLLED DATA
TRANSFER
Microprocessor executes a program for
a data transfer between any two devices
in the system.
Considering transfer of data from
memory to I/O device.
μp performs its ordinary data
processing task (Main program) when
the I/O device is not ready for data
transfer operation.
When I/O device becomes ready, it
sends interrupt signal (INT) to μp. μp′s
In this, every byte to be transferred is control is then transferred to the
taken from memory inside μp, then by Interrupt Service Routine (ISR).
executing an out instruction that byte is It simply takes a data byte from
transferred to I/O device via I/O port. memory and transfer to ready I/O
During transfer of every byte, processor device through I/O port by executing
also check the ready condition of the OUT instruction.
corresponding I/O device. If I/O device The ISR transfers control to the main
is not ready then processor keeps on program by return operation (RET).
polling that I/O device until it become Again processor continues the
ready. execution of main program for a period
When I/O device becomes ready for which I/O device is not ready.
processor completes that data transfer. Again I/O device may become ready for
Advantages accepting the next data byte and
Implementation of parallel data transfer execute ISR for next data byte transfer
is very simple. and subsequently returns back to the
Disadvantages main program.
51
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Thus the processor executes main 1. Both interrupt based and DMA based
program and ISR in an interleaved data transfers are used for transferring
fashion. the data to or from I/O devices.
Here, processor does not remain idle. 2. In both, I/O devices send a request to
get served by the processor.
3. In both, memory read as well as I/O
write operations can take place.
4. In both, Interrupt based and I/O based
data transfer can take place through
system buses only.
5. In both, microprocessor time is not
wasted at all in the data transfer.
Disadvantage
I/O device is to be connected to one of the
interrupt line of the processor.
Advantage
Microprocessor’s time is not wasted at all
for the data transfer operation.
4.6.4 DMA TRANSFER
Interrupt based DMA based data
data Transfer Transfer Interrupt are requested and acknowledged
1. In Interrupt based In DMA based data in much the same ways as DMA requests.
data Transfer the I/O transfer, the I/O devices However, an interrupt is not a request for
devices are directly send interrupt through
interrupted to the DMA controller. bus control; rather, it asks the CPU to begin
microprocessor by executing an interrupt service program.
(INT) command. The interrupt program performs tasks such
2. The processor is Processor is not getting
involved in the data involved in the process as initiating an IO operation or responding
transfer to off from of data transfer. to an error encountered by the IO device.
I/O device. The CPU transfers control to this program
3. The processor is not The processor looses the
losing the control control over system bus
is essentially the same way it transfers
over the system bus and DMA controller will control to a subroutine. The CPU responds
at any time. take charge over bus to interrupts only between instruction
until the data transfer
gets completed. It
cycles.
returns the system bus
to P after completion 4.7 RESPONSIBILITIES OF I/O
of data transfer process. INTERFACE
4. In main program if There are various
any interrupt occurs, methods of DMA data Input-output subsystem of a computer
it will execute transfer provides an efficient mode of
corresponding ISR -Byte/cycle stealing
and after that it will mode.
communication between the central
return to next line of -Burst/Demand mode system and outside environment.
the same program. -Continuous /Block Programs and data must be entered
mode.
5. No extra hardware Extra hardware required
into computer memory for processing
required in this called as DMA controller and result obtained from computation
method (like 8237) must be recorded or displayed.
There are few similarities between Input devices attached to computer
Interrupt driven data transfer and DMA either online or off-line are called
based data transfer. peripherals.
52
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
I/O interface contains logic for
4.7.1 INPUT-OUTPUT INTERFACE
performing functions of communication
between the peripheral and the bus.
Input-output interface provides a
The I/O system must have an interface
method for transferring information
internal to the computer and an I/O
between internal storage and external
interface external to the computer.
I/O devices.
The major requirement, for an I/O
Peripherals connected to a computer
system can be given as
need special communication links to
1) Control & timing
interface them with the central
2) CPU communication
processing unit.
3) Device communication
The purpose of the communication 4) Data buffering
links is to resolve the difference that 5) Error detection
exist between the central computer and
each peripheral.
Control and Timing is required to co-
The major difference between CPU and ordinate the flow of traffic between
peripherals is as follows: external resource and external devices.
1) Peripherals are electromechanical
CPU communication involves
and electromagnetic devices and
exchange of data between CPU & the
their manner of
I/O system over the data bus.
2) Operation is different from the
Data buffering is required because of
operation of the CPU and memory.
difference in data transfer rate of CPU
3) Data transfer rate of peripherals is
and memory.
usually slower than the transfer rate
The data coming from memory of CPU
of CPU.
are sent to I/O system buffer and then
4) Data codes and format in peripherals
send to the peripheral device at its own
differ from the word format in the
data rate.
CPU and memory.
I/O system is also responsible for error
detection and for reporting errors to
To resolve these differences, computer
the CPU.
system include special hardware
components between the CPU and
4.7.2 I/O INTERFACE CIRCUITS
peripherals to supervise and
synchronize all input and output
transfers. These components are called The task of connecting an I/O device to
interface unit. a computer system is greatly simplified
by the use of standard IC’s variously
Each device may have its own controller
known as I/O interface circuits.
that supervise the operation of a
particular mechanism in a peripheral. The circuit allows I/O devices
connected to standard bus with
I/O interface has two major roles as
minimum hardware or software.
follows
1) Interface to the CPU and memory via
4.8 IBM 370 I/O CHANNEL
the system bus.
2) Interface to one or more I/O devices The I/O processor in the IBM 370 to
via the data links. computer is called I/O channel.
Links to peripheral devices are used to There are three types of channels
exchange control, status and data i) Multiplexer
between I/O system peripheral and the ii) selector and
bus. iii) block multiplexer.
53
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
The multiplexer channel can be The address channel responds to each
connected to a number of slow and I/O instructions and executes it.
medium speed devices and is capable of The four condition codes [Processor
operating with number of I/O devices Status Word (PSW)] specifies whether
simultaneously. the channel or the device is busy,
The selector channel is designed to whether it is operational or not,
handle one I/O operation at a time and whether interrupts are pending, if the
is normally used to control one high- I/O operations had started successfully.
speed device. The status field identifies the state of
The block multiplexer channel the device, channel and any errors that
combines the features of both the occurred during the transfer.
multiplexer and selector channels. The format of the Channel Command
It provides a connection to a number of Word is shown in
high speed devices, but the entire block The data address field specifies the first
of data as compared to a multiplexer address of memory buffer and the count
channel, can transfer only one byte at a field gives the number of bytes involved
time. in the transfer.
The command field specifies an I/O
operation and the flag bits provide
additional information for the channel.
The command field corresponds to an
operation code that specifies basic
types of I/O operations.
1. Write : Transfer data from memory
to I/O device.
2. Read: Transfer data from I/O
device to memory.
3. Read backwards: Read magnetic tape
The CPU communicates directly with with tape moving back word.
the channel through dedicated control 4. Control: Used to initiate and
lines and indirectly through reserved operation not involving transfer of
storage areas in memory. Above figure data.
shows the word formats associated 5. Sense : Informs the channel to
with the channel operation. transfer it channel status word to
The I/O instruction format has three memory.
fields 6. Transfer in channel : Used for jump
(i) Operation code instruction.
(ii) Channel address and
(iii) Device address 4.8.1 BUS ARBITRATION
The computer system may have a
number of channels and each is Several master or slave units connected
assigned an address. Each channel may to a shared bus may request access to
be connected to several devices and the bus at the same time. A selection
each device is assigned an address. mechanism is called bus arbitration.
The operation code specifies one of There are various types of arbitration
eight I/O instruction: start I/O, start I/O schemes:
fast release, test I/O, clear I/O, halt I/O, Daisy chaining
halt device, test channel and store Polling
channel identification. Independent Requesting
54
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4.8.2 DAISY CHAINING In response to a signal on BUS
REQUEST, the bus controller proceeds
This method involves three control to generate a sequence of numbers on
signals to which we assign the generic the poll-count lines.
names BUS, REQUEST, BUS GRANT and
The priority of a bus unit is determined
BUS BUSY.
by the position of its address in the
The bus controller respond to a BUS polling sequence. This sequence can be
REQUEST signal only if BUS BUSY is programmed if the poll count lines are
inactive. Receiving a BUS BUSY for the connected to a programmable register.
duration of its new bus activity. Hence selection priority can be altered
under software controls.
The advantage of polling over daisy
chaining is that in polling a failure in
one unit need not affect other units.
The flexibility is achieved at the cost of
more control lines. Also the number of
units that can share the bus is limited
by the addressing capability of the poll-
count lines.
When the first unit requesting access to 4.10 INDEPENDENT REQUESTING
the bus receives BUS GRANT, it blocks
further propagation of that signal, It has separate BUS REQUEST and BUS
activates BUS BUSY. It begins to use the GRANT lines for every units.
bus. When a non requesting unit
receives the BUS GRANT signal, it
forwards the signal to the next unit.
If two units simultaneously request bus
controller, then the one that receives
BUS GRANT first gets access to the bus.
Selection priority is therefore
determined by the order in which the
unit are linked by the BUS GRANT lines.
4.9 POLLING
It provides the bus controller with
In that scheme, polling replaces the BUS immediate identification of all
GRANT line of the daisy-chain method requesting units.
with a set of poll-count lines that are It responds rapidly to request for bus
connected directly to all units on the access.
bus, as shown below: The bus control unit determines
priority, which is programmable.
The main drawback of bus control by
independent requests is the fact that
‘2n’ BUS REQUEST and BUS GRANT
lines must be connected to the BUS
controller in order to control ‘n’ devices.
Daisy chaining requires two such lines,
while polling requires approximately
login lines.
55
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4.11 LOCAL COMMUNICATION Example:
A simple time sharing network, as
Local communication is also called bus shown in following figure, which
communication. The various processor connects many user terminals to a
level components (CPU,IOP, main remotely located computer via the
memory I/O devices) of a computer public telephone system.
system are interconnected by buses.
Many bus organizations are possible.
Two very common types are shown as
follows:
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Dedicated links allows very fast transfer
Interrupt are used for any infrequent or
of information through the system.
exceptional event that causes a CPU to
All ‘n’ devices may send or receive
temporarily transfer control from its
information simultaneously, and there
current program to another program.
is no delay due to busy connections.
Interrupts handler services the event.
Systems with dedicated links are
inherently reliable, since the failure of Interrupt are the primary mean by
any link affects which I/O device obtained the services.
Communication only between the two I/O interrupts are external requests to
units connected to that link. CPU to initiate or terminate an I/O
operation.
4.11.5 BUS CONTROL Interrupts are also produced by
hardware or software error detection
In most computers, the CPU is the usual
circuits that invoke error handling
bus master, while the memory and I/O
routines within the operating system.
interface circuits are the slave.
A power supply failure at any instance,
IOP and certain other I/O controller can
generate an interrupt that request
also serve as the bus master.
execution of an interrupt handler
Only a master can initiate data transfer.
designed to save critical data about the
Bus slave can only respond to system’s state.
commands issued by a bus master. Interrupts generated internally by the
In synchronous buses each item is CPU are called traps.
transferred during a time slot known in An operating system will interrupting a
advance to both the source and user program that exceeds its allotted
destination units. This implies that the time.
bus interface circuits of both units are
The basic method of interrupting the
synchronized.
CPU is by activating a control line with
Synchronization can be achieved by the generic name INTERRUPT REQUEST
driving both units from a common clock that connects the interrupt source to
source, a method that is feasible over CPU.
short distance.
An interrupt indicator is stored in a CPU
An alternative approach widely used in register. CPU register is tested
local bus communication is periodically, usually at the end of every
asynchronous communication, in which instruction cycle.
each being transferred is accomplished
On recognizing the presence of
by a separate control signal to indicate
interrupt, CPU must execute a specific
its presence to the destination unit.
interrupt servicing program.
The destination unit may respond with
A problem is caused by the presence to
another control signal to acknowledge
two or more interrupt requests at the
receipt of the information.
same time.
As each device can generate these
Priorities must be assigned to the
control signals at its own rate, data
interrupts and the interrupt with higher
transmission rate can vary with the
priority is selected for service.
inherent speed of the communicating
When interrupt occurs, the following
devices.
steps are taken by CPU:
This flexibility in transmission rates is
achieved at the cost of more complex
1) CPU identifies the source of the
bus control circuitry.
interrupt by polling I/O device.
4.11.6 INTERRUPT MECHANISM
57
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2) The CPU obtains the memory Interrupt acknowledgement is used to
address of the required interrupt locate the device which has actually
handler. interrupted to the microprocessor.
This address can be provided by When microprocessor gets interrupted
interrupting device along with its by device it executes the corresponding
interrupt request. ISR to service that device.
3) The program counter (PC) & other In single level interrupt system virtually
CPU status information are saved in any number of interrupting I/O devices
memory. can be connected.
4) Program counter (PC) is loaded with Disadvantage: Selection of I/O device
the address of interrupt handler. which has actually interrupted the
Execution proceeds until a return processor is a time consuming process.
instruction is encountered, which
transfer control back to the 4.11.8 MULTI-LEVEL /MULTI LINE
interrupted program. INTERRUPT
Interrupt are maskable as well as non-
maskable.
Maskable interrupts can be enabled or
disabled by instructions in instruction
set.
When higher priority interrupt is being
serviced, the lower-priority interrupt
gets disabled by CPU.
Low priority interrupts are served
when no other interrupts with higher Here limited number of interrupting
priority interrupt are awaiting devices are connected to processor
execution. which is equal to the number of general
interrupt request lines available to the
4.11.7 SINGLE LEVEL AND MULTILEVEL processor.
INTERRUPTS
For each of that interrupt request line
the processor will have one interrupt
flag bit.
Each of the interrupt request line there
will be a separate ISR used to service
that I/O device.
If multiple interrupt request are coming
from I/O devices then processor
internally resolves their priorities and
executes the corresponding ISR. It
Microprocessor has only one general provides fast response for interrupt
interrupt request line (INT) to which request coming from I/O devices.
multiple interrupting devices are Drawback: Limited number of I/O
connected. devices can be connected.
The interrupt is sensed by the
4.11.9 VECTORED INTERRUPTS
microprocessor. It sets internal
interrupt flip-flop to logic 1 and then There are two possible implementations of
gives corresponding acknowledgement the vectored interrupts depending on the
(INT ACK) type of microprocessor.
58
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Here, the interrupt request coming from transferred to execute the ISR and to
I/O devices are stored in interrupt service that I/O device.
register by setting the corresponding
bits.
The interrupt mask register is user
programmable which is used to
disable/mask corresponding interrupt
requests.
When multiple interrupts are
forwarded to the input of priority
encoder, it will encode only higher
priority interrupt input using priority
encoder and accordingly code will be
generated.
This code is inserted at predefined
locations in Program Counter (PC).
59
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
5 MULTIPLE PROCESSOR ORGANISATION
60
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Here, multiple instruction streams are fetch The co-ordination among processors
by control units. These instruction stream required to execute the MESI protocol
are decoded to get multiple decoded simultaneously its own bottleneck in
instruction stream (IS), which operates on the shared bus.
multiple data stream (DS) taken from The result is that typical multiprocessor
shared memory modules. Here, each system are limited to a few tens of
processing element executes one instruction processors.
stream and operates exactly on one data A shared- memory multiprocessor with
stream. thousands of processors does not
5.2 MULTIPROCESSOR appear to be practical.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
It is used in design of large structures like
dams, supersonic jets, ships, etc. In these
designs large amount of partial different
equations are to be solved concurrently,
hence parallel architectures and algorithms
are used.
62
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1. Bandwidth of time shared bus which 3 In loosely coupled system, the In, tightly coupled
processor do not share memory, system, there is a single
defines the speed of message transfer. and each processor has its own system-wise primary
2. Message length local memory. memory that is shared
by all the processors
3. Message arrival rate, which the
degree of interaction/coupling
between various computer modulus. C
SIMD MIMD
4 In these systems, all physical In these systems,
1 It is also called as Array It is also called as communication between the communication
Processor Multiprocessor processors is done by passing between the processors
2 Here, single stream of Here multiple streams of message across the network that usually takes place
instruction is fetched instructions are fetched interconnects the processors. through the shared
3 In SIMD (single instruction In MIMD(multiple memory.
multiple data) the instruction instruction multiple data 5 In this system, the interaction In this system, the
stream is fetched by shared stream) the instruction between various processors and interaction between
memory. stream is fetched by computer modules is very less. various processors is
control unit. very high.
4 Here instruction is Here instruction streams 6 In this, (As channel and Arbiter In this, shared memory
broadcasted to multiple are decoded to get multiple Switch) is used to connect the modules can
processing elements (PEs), decoded instruction computer module to a message communicate through
which will operate on streams (IS), which operate transfer system. PMIN.
different data sets (i.e. data on multiple data 7 In this, the processor is directly In this, processor are
streams are taken from stream(DS) taken from connected to IO devices. sometimes referred as
shared memory modules). shared memory modules. tightly coupled system.
Hence the name SIMD. 8 Distributed memory Shared-memory
5 Diagram: Diagram: multiprocessors are sometimes multiprocessor are
referred as loosely coupled sometimes referred as
system. tightly coupled system.
9 There are no unmapped local In this tightly coupled
memories in loosely coupled system, every processor
system. has small unmapped
local memories which is
used to store code.
Here, mI=1 and mD >1 Devices such as the keyboard and mouse
Example: Here, m1 > 1 and mD > 1. are connected directly to the computer
ILLIAC IV have a single This covers
program-control unit & many multiprocessors, which are with they are used, typically through a
independent execution unit. computers with more than serial communication link.
mI : minimum number of one CPU & the ability to
instruction execute several programs
mD : minimum number of data simultaneously. 5.7 ASYNCHRONOUS TRANSMISSION
streams
The simplest scheme for serial
Loosely Coupled Multiprocessor Tightly Coupled
Multiprocessor communication is asynchronous
In loosely coupled multiprocessor In tightly coupled transmission using a technique called
organization communication multiprocessor organization
between computer modules has communication between start-stop. Data are organized in small
taken place through message processor`s can be taken groups of 6 to 8 bits with will defined
Transfer system place through PMIN.
Diagram beginning and end for timing recovery.
In a typical arrangement, alphanumeric
characters encoded in 8-bits are
transmitted as shown in figure.
63
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
→ The line connecting the transmitter and other way, it may be used as a parity bit,
receiver is in the 1 state when idle. to aid in detecting transmission errors.
→ Transmission of a character is Parity bit = 1 transmitted data
preceded by a 0 bit, referred to as the contains an odd number
start bit followed by eight bits and one Parity bit = 0 otherwise
or two stop bits. When a parity bit is used, it is set by the
→ The stop bits have a logic value 1. The transmitter such that the parity of the 8
start bit alerts the receivers that data bits transmitted is always the same i.e.
transmission is about to begin. either odd or even.
→ The leading edge is used to synchronize If a transmission error causes the value
the receiver’s clock with that of the of one bit to charge, the receiver will
transfer. detect an incorrect parity and hence
The stop bits at the end delineate will be able to determine that an error
consecutive characters in the case of has occurred.
continuous transmission. Disadvantage
→ Inserting and removing the start and In the start-stop method explained
stop bits is the responsibility of above, the position of the 1 to 0
transmission and reception circuitry. transition at the beginning of the start
bit as shown in figure above and is the
For proper synchronization at the important key to obtain a correct
receiving end, the receiver clock is timing.
derived from a local clock whose Therefore, this scheme is useful only
frequency is relatively higher than the where the speed of transmission is
transmission rate. (typically 16 times sufficiently low and the conditions on
higher) the transmission link are such that
This clock is used to increment a square waveform shown in the figure
modulo-16 counter, which is reset to 0 maintains its shape. For higher speed
when the leading edge of a start-bit is and longer lines, much signal degradation
detected. When the counters count up takes place.
to 8, it indicates that the middle of the
start bit has been reached. 5.8 SYNCHRONOUS TRANSMISSION
The value of the start bit is sampled to In synchronous transmission, data are
confirm that it is a valid start bit, and transmitted in blocks consisting of several
the counter is again reset to 0. hundreds or thousands of bits each. The
Therefore, whenever count reaches 16, start and end of each block are marked by
the incoming data signal is sampled, appropriate codes and data within a block
which should be close to the middle of are organized according to an agreed upon
each bit transmitted. set of rules. For complete transmission and
Therefore, as long as the relative detection of carrier frequencies and
positioning of bits within transmitted establishment of synchronization, modems
characters is not in error by more than require a significant start-up time.
one half of a clock cycle, the receiver
currently interprets the bits of the 5.9 SOLVED EXAMPLES
encoded characters.
While transmitting characters, they are 1. An asynchronous serial communication
represented by the 7-bit ASCII code controller that uses a start-stop scheme
occupying bits 0 through 6 in figure for controlling the serial I/O of a system
above. The MSB i.e. but 7 of the is programmed for a string of length 7
transmitted byte is usually set to 0. In bit, one parity bit (odd parity) and one
64
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
stop bit. The transmission rate is 1000 Page size is 4K(212) word. 12 bits are
bits/second. required to address a word in the page
i) What is the complete bit stream that frame.
is transmitted for the string 7 bits 5 bits 12 bits
‘0110101’? Logical address: Segment Page Word
ii) How many strings can be
transmitted per second? Physical address Block Word
Solution 12 bits 12 bits
i) Complete bit stream that is The logical address is partitioned into 3
transmitted for the fields. The segment field specifies a
string ‘0110101’ is 1011010101 as segment number. the page field
per requirement. specifies a page within the segment and
ii) The number of strings can be the word field gives a specific word
transmitted = 100. within the page. A page field of k bits
can specify upto 2kpages. A segment
2. Consider a CRT display that has a next number may be associated with just one
mode display format of 75 30 page or as many as 2k pages. Thus the
characters with a 9 12 characters cell. length of segment can vary according to
What is the video buffer RAM for the the no. of pages assigned to it.
display to be used in monochrome (1-
bit per pixel) graphics mode?
Solution
Number of bits required
= No. of characters cell size
= 75 30 9 12
= 243000 bytes
= size of RAM
3. The logical address space in a computer The mapping of a logical address into a
system consists of 128 segments of physical address is as shown in the
capacity 32 pages of 4 K words. The figure. The segment no. of the logical
physical memory consists of 4K page address specifies the address for the
frames each of 4K words capacity. segment table. The entry in the segment
i) Formulate the logical and physical table is a pointer address for a page
address table base. Page table base is added to
ii) Give the block diagram for table the page number given in logical
translation address. The sum produces a pointer
Solution address to an entry in the page table.
Formation of logical address: The value found in the page table
128 = 27 =7 bits are required to address provides the block number in the
128 segments physical memory.
32 = 25 =5 bits are required to address
32 pages within each segment.
Page size is 4K (212) word. 12 bits are
required to address a word in the page.
Formation of physical address:
12 bits are required to address 4K page
frame.
65
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
GATE QUESTIONS
Topics Page No
66
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1 CACHE AND MAIN MEMORY
Q.1 A graphics card has on board Recently Used (LRU) scheme. The
memory of 1 Mbyte. Which of the number of cache misses for the
following modes can the card not following sequence of block
support? addresses is 8, 12, 0, 12, 8
a) 1600 x 400 resolution with 256 a) 2 b) 3
colors on a 17 inch monitor c) 4 d) 5
b) 1600 x 400 resolution with 16 [GATE-2006]
million colors on a 14 inch
monitor Statements for Linked Questions no 6 & 7
c) 800 x 400 resolution with 16 A CPU has a 32 Kbyte direct mapped cache
million colors on a 17 inch with 128-Byte block size. Suppose
monitor A is a two-dimensional array of size 512x
d) 800 x 800 resolution with 256 512 with elements that occupy 8-byte each.
colors on a 14 inch monitor Consider the following two C code
[GATE-2000] segments, P1 and P2
P1 :
Q.2 Which of the following requires a for (i = 0; i < 512 ; i + +) {
device driver? for (j = 0; j < 512; j + +) {
a) Register b) Cache x + = A [i][j] ;
c) Main memory d) Disk }
[GATE-2002] }
P2 :
Q.3 More than one words are put in one for (i = 0; i < 512 ; i + +) {
cache block to for (j = 0; j < 512; j + +) {
a) exploit the temporal locality of x + = A [j][i] ;
reference in a program }
b) exploit the spatial locality of }
reference in a program P1 and P2 are executed independently with
c) reduce the miss penalty the same initial state, namely, the Array A is
d) None of the above not in the cache and i , j , x are in registers.
[GATE-2002] Let the number of cache misses
experienced by P1 be M1 and that for P2 be
Q.4 Increasing the RAM of a computer M2.
typically improves performance Q.6 The valueM1 is
because a) Zero b) 2048
a) virtual memory increases c) 16384 d) 262144
b) larger RAM are faster [GATE-2006]
c) fewer page faults occur
d) fewer segmentation faults occur M1
Q.7 The value of the ratio is
[GATE-2005] M2
1
Q.5 Consider a small two-way set- a) Zero b)
16
associative cache memory, consisting
1
of four blocks. For choosing the c) d) 16
block to be replaced, use the Least 8
[GATE-2006]
67
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Common Data for Questions no 8 and 9 Q.12 Consider a 4-way set associative
Consider two cache organizations : cache consisting of 128 lines with a
The first one is 32 Kbyte 2-way set line size of 64 words. The CPU
associative with 32 Kbyte block size. The generates a 20-bit address of a word
second one is of the same size but direct in main memory. The number of bits
mapped. The size of an address is 32 bit in in the TAG, LINE and WORD fields
both cases. A 2-to-1 multiplexer has a are respectively.
latency of 0.6 ns while a k-bit comparator a) 9, 6, 5 b) 7, 7, 6
has a latency of k/10 ns. The hit latency of c) 7, 5, 8 d) 9, 5, 6
the set associative organization is h1 while [GATE-2007]
that of the direct mapped one is h2.
Q.13 In an instruction execution pipeline,
Q.8 The value of h1 is the earliest that the data TLB
a) 2.4 ns b) 2.3 ns (Translation Look a side Buffer) can
c) 1.8 ns d) 1.7 ns be accessed is
[GATE-2006] a) Before effective address
Q.9 The value of h2 is calculation has started
a) 2.4 ns b) 2.3 ns b) During effective address calculation
c) 1.8 ns d) 1.7 ns c) After effective address calculation
[GATE-2006] has completed
d) After data cache lookup has
completed
Statements for Linked Questions no 10
and 11 Common Data for Questions no 14, 15
Consider a machine with a byte and 16
addressable main memory of 216 byte. Consider a machine with a 2-way set
Assume that a direct mapped data cache associative data cache of size 64 Kbyte and
consisting of 32 lines of 64 byte each is block size 16 byte. The cache is managed
used in the system. A50 X 50 two- using 32 bit virtual addresses and the page
dimensional array of bytes is stored in the size is 4 Kbyte. A program to be run on this
main memory starting from memory machine begins as follows.
location 1100H. Assume that the data cache double ARR [1024] [1024]
is initially empty. The complete array is int i , j ;
accessed twice. Assume that the contents of / Initialize array ARR to 0.0 /
the data cache do not change in between for (i =0; i < 1024 ; i + +)
the two accesses. for (j =0; j < 1024 ; j + +)
ARR [i] [j] = 0.0;
Q.10 How many data cache misses will The size of double is 8 Byte. Array ARR is
occur in total? located in memory starting at the
a) 48 b) 50 beginning of virtual page 0 x FF000 and
c) 56 d) 59 stored in row major order. The cache is
[GATE-2007] initially empty and no pre-fetching is done.
The only data memory references made by
Q.11 Which of the following lines of the the program are those to array ARR.
data cache will be replaced by new
blocks in accessing the array for the Q.14 The total size of the tags in the cache
second time? directory is
a) line 4 to line 11 b) line 4 to line 12 a) 32 Kbit b) 34 Kbit
c) line 0 to line 7 d) line 0 to line 8 c) 64 Kbit d) 68 Kbit
[GATE-2007] [GATE-2008]
68
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.15 Which of the following array request for memory blocks is in the
elements has the same cache index following order:
as ARR [0][0]? 0, 255, 1, 4, 3, 8, 133, 159, 216, 129,
a) ARR [0][4] b) ARR [4] [0] 63, 8, 48, 32, 73, 92, 155
c) ARR [0] [5] d) ARR [5] [0] Which one of the following memory
[GATE-2008] blocks will not be in cache if LRU
replacement policy is used?
Q.16 The cache hit ratio for this a) 3 b) 8
initialization loop is c) 129 d) 216
a) 0% b) 25% [GATE-2010]
c) 50% d) 75%
[GATE-2008] Statements for Linked Questions no 21
and 22
Q.17 For inclusion to hold between two The computer system has an l1 l2 cache, an
cache levels L1 and L2 in a multi- l2 cache and a main memory unit connected
level cache hierarchy, which of the as shown below. The block size in l1 cache
following are necessary? is 4 words. The block size in l2 cache is 16
1. L1 must be a write-through cache. words. The memory access times are 2 ns,
2. L2 must be a write-through cache. 20 ns and 200 ns, for l1 cache, l2 cache and
3. The associativity of L2 must be main memory unit respectively.
greater than that of L1.
4. The L2 Cache must be at least as
large as the L1 cache.
a) 4 b) 1 and 4
Q.21 When there is a miss in L1 cache and
c) 1, 2 and 4 d) 1, 2, 3 and 4
a hit in L2 Cache, a block is
[GATE-2008]
transferred from L2 cache to L1
Q.18 How many 32 k × 1 RAM chips are cache. What is the time taken for
needed to provide a memory this transfer?
capacity of 256 Kbyte? a) 2 ns b) 20 ns
a) 8 b) 32 c) 22 ns d) 88 ns
c) 64 d) 128 [GATE-2010]
[GATE-2009]
Q.22 When there is a miss in both L1
Q.19 A main memory unit with a capacity cache and L2 cache, first a block is
of 4 megabyte is built using 1 M × 1 transferred from main memory to L2
bit DRAM chips. Each DRAM chip Cache, and then a block is
has 1 k rows of cells with 1 k cells in transferred from L2 Cache to L1
each row. The time taken for a single cache what is the total time taken
refresh operation is 100 ns. The for these transfer?
time required to perform one a) 222 ns b) 880 ns
refresh operation on all the cells in c) 902 ns d) 968 ns
the memory unit is [GATE-2010]
a) 100 ns b) 100 ∗ 210 ns Q.23 An 8 Kbyte direct mapped write-
∗ 20
c) 100 2 ns d) 3200 ∗ 220 ns back cache is organized as multiple
[GATE-2010] blocks, each of size 32 byte. The
Q.20 Consider a 4-way set associative processor generates 32-bit addresses.
cache (initially empty) with total 16 The cache controller maintains the
cache blocks. The main memory tag information for each cache block
consists of 256 blacks and the comprising of the following :
69
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1 Valid bit c) 1/A d) k/n
1 Modified bit [GATE-2014-1]
As many bits as the minimum
needed to identify the memory Q.27 A 4-way set-associative cache
block mapped in the cache. What is memory unit with a capacity of 16
the total size of memory needed at KB is built using a block size of 8
the cache controller to store words. The word length is 32 bits.
metadata (tags) for the cache? The size of the physical address
a) 4864 bit b) 6144 bit space is 4 GB. The number of bits for
c) 6656 bit d) 5376 bit the TAG field is _____
[GATE-2011] [GATE-2014-2]
Q.24 In a k-way set associative cache, the Q.28 In designing a computer’s cache
cache is divided into 𝜐 sets, each of system, the cache block (or cache
which consists of k lines. The lines of line) size is an important Parameter.
a set are placed in sequence one Which one of the following
after another. The lines in set s are statements is correct in this context?
sequenced before the lines in set a) A smaller block size implies
(𝑠 + 1). The main memory blocks better spatial locality
are number 0 onwards. The main b) A smaller block size implies a
memory block numbered j must be smaller cache tag and hence
mapped to any one of the cache lower cache tag overhead
lines from c) A smaller block size implies a
a) ( j mod v)* k to (j mod v)* k + (k -1) larger cache tag and hence lower
b) ( j mod v) to ( j mod v ) + (k - 1) cache hit time
c) ( j mod k) to ( j mod k) + (v -1) d) A smaller block size incurs a
d) (j mod k) * v to ( j mod k) * v+ (v -1) lower cache miss penalty
[GATE-2013] [GATE-2014-2]
Q.25 A RAM chip has a capacity of 1024 Q.29 If the associativity of a processor
words of 8 bits each (1K×8). The cache is doubled while keeping the
number of 2×4 decoders with capacity and block size unchanged,
enable line needed to construct a which one of the following is
16K ×16 RAM from 1K×8 RAM is guaranteed to be NOT affected?
a) 4 b) 5 a) Width of tag comparator
c) 6 d) 7 b) Width of set index decoder
[GATE-2013] c) Width of way selection
Q.26 An access sequence of cache block multiplexor
addresses is of length N and d) Width of processor to main
contains n unique block addresses. memory data bus
The number of unique block [GATE-2014-2]
addresses between two consecutive
accesses to the same block address Q.30 The memory access time is 1
is bounded above K. What is the nanosecond for a read operation
miss ratio if the access sequence is with a hit in cache, 5 nanoseconds
passed through a cache of for a read operation with a miss in
associativity A≥ k exercising least- cache, 2 nanoseconds for a write
recently-used replacement policy? operation with a hit in cache and 10
a) n/N b) 1/N nanoseconds for a write operation
with a miss in cache. Execution of a
70
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
sequence of instructions involves Q.33 A processor can support a maximum
100 instruction fetch operations, 60 memory of 4 GB, where the memory
memory operand read operations is word addressable (a word
and 40 memory operand write consists of two bytes). The size of
operations. The cache hit-ratio is the address bus of the processor is
0.9. The average memory access at least _________bits.
time (in nanoseconds) in executing [GATE-2016-1]
the sequence of instructions
is__________. Q.34 The width of the physical address on
[GATE-2014-3] a machine is 40 bits. The width of
the tag field in a 512 KB 8-way set
Q.31 Assume that for a certain processor, associative cache is _________ bits.
a read request takes 50 [GATE-2016-2]
nanoseconds on a cache miss and 5
Q.35 A file system uses an in-memory
nanoseconds on a cache hit. Suppose
cache to cache disk blocks. The miss
while running a program, it was
rate of the cache is shown in the
observed that 80% of the processors
figure. The latency to read a block
read requests result in a cache hit.
from the cache is 1ms and to read a
The average and access time in
block from the disk is 10ms. Assume
nanoseconds is _______.
that the cost of checking whether a
[GATE-2015-2]
block exists in the cache is
Q.32 Consider a machine with byte negligible. Available cache sizes are
addressable main memory of 2020 in multiples of 10 MB.
bytes, block size of 16 bytes and a
direct mapped cache having 212
cache lines. Let the address of two
consecutive bytes in main memory
be (E201F)16 and (E2020)16 . What
are the tag and cache line address
(in hex) for main memory address
(E201F)16?
a) E, 201 b) F, 201 The smallest cache size required to
c) E, E20 d) 2, 01F ensure an average read latency of
[GATE-2015-3] less than 6 ms is _________ MB.
[GATE-2016-2]
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(b) (b) (a) (c) (c) (c) (b) (a) (b) (c) (a) (b) (b) (b)
15 16 17 18 19 20 21 22 23 24 25 26 27 28
(b) (b) (a) (c) (d) (d) (c) (a) (d) (b) (b) (a) 20 (d)
29 30 31 32 33 34 35
(d) 1.68 14 (a) 31 24 30
71
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q.1 (b) Thus, lesser number of page faults
A graphics card with on board occur. Now if any replacement rule
memory of 1 Mbyte cannot support is applied which doesn't cause blade
a mode of 1600×400 resolution with anomaly always results in reduction
16 million colors on a 14 inch of page faults.
monitor.
Q.5 (c)
Q.2 (b) The page frames content after
In computing, a device driver or applying LRU for the sequence 8, 12,
software driver is a computer 0, 12, 8 is
program allowing higher-level
computer programs to interact with
a hardware device.
A driver typically communicates
with the device through the
computer bus or communications Therefore, total number misses
subsystem to which the hardware =4
connects. When a calling program
invokes a routine in the driver, the Q.6 (c)
driver issues commands to the 16 array elements are brought into
device. Once the device sends data the cache as the first element
back to the driver, the driver may A[0][0] is accessed and there will be
invoke routines in the original calling hits for the next 15 accesses for A[0]
program. Drivers are hardware [0] to A[0] [15] which are in cache
dependent and operating system and a miss at A[0] [16], Therefore,
specific. They usually provide the there occurs 15 hits and one miss,
interrupt handling required for any for every 512 × 512/16 = 16384
necessary asynchronous time- block transfer during P1.
dependent hardware interface.
Q.7 (b)
Q.3 (a) As the next element required to be
It is done to exploit the temporal accessed after A[0][0] is A[1][0],
locality of reference in a program as then the elements A[0][1] to A[0]
cache is the fastest memory available [15] brought into cache are of no
temporarily which is the mirror use.
image of main memory and it stores Thus, there will be 262144 (512 ×
more than a word at one time. 512) misses and no hits
Therefore,
Q.4 (c) M1/M2 = 16384/262144 = 1/16.
We know that, size is directly
proportional to page frame. So, the Q.8 (a)
RAM is increased. The main memory Consider the following table, as it is
also increases and thus the page given the following is concluded :
frame size also increases which
results in reduction of snapping.
72
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Therefor h1 = 18/10 + 0.6 = 2.4 ns kbyte and 1 row contains 1024
elements, i.e., 210 locations.
Q.9 (b)
Consider the following table, as it is Q.16 (b)
given the following is concluded : We know that hit ratio is given as
Number of hits/(Number of hits +
Number of rows)
= 4/16 = 25%
Therefor, h2 = 17/10 + 0.6
= 2.3 ns Q.17 (a)
For a multilevel cache hierarchy the
Q. 10 (c) condition that is necessary to hold
The number of data cache misses inclusion is that the L2 cache must
that occur on total are 56 as is clear be at least as large as the L2 cache.
from the given data. And since both the levels are write
through cache this is not the
Q.11 (a) sufficient condition as is depicted by
The lines that will be replaced by the following figure:
the new blocks are lines 4 to 11.
Q.12 (b)
7 bits are required if there are 128
lines. The reason behind is that 128
is 2 ^ 7.
Now, each line is of 64 words or 2 ^
6 words. Hence, number of bits
required is 6 bits 64 or 2 . As per the
given, a 20 bit address is generated Q.18 (c)
for a word in main memory, so bits As given, basic RAM is 32 k x 1 and
required for tag = 20 – (7 + 6) = 20 – we have to design a RAM of 256 kx 8.
13 = 7 bit. Therefore, number of chips
required= 256 k × 8/(32k × 1)
Q.13 (b) = 245×1024× 8/32 × 1024 × 1)
During effective address calculation, (Multiplying and dividing by 1024)
the translation look aside buffer = 64 = 8x8
data can be accessed earliest. Means, 64 = 8 parallel lines x 8 serial
RAM chips
Q.14 (b)
Q.19 (d)
Since, the capacity is 4 MB therefore
4*106* 8 = 32* 106 ...A
From the above figure and given
And 1 k* 1 k (rows* cells) = 220
conditions the total number of tags
...B
comes out to be = 17 x 2 x 1024 = 34
Therefore, the time required to
bit
perform one refresh operation on all
the cells in the memory unit is A*B
Q.15 (b)
=32*106*220
The array element ARR[4] [0] has
the same cache index as ARR[0] [0]
Q.20 (d)
since it is given that page size is of 4
73
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
48 0 3 6 9 12
Set 0 4 32 8 1 4 7 10 13
26 92
2 5 8 11 14
1
The cache is divided into v = 3 sets
133 and each of which consists of k = 5
Set 1 73 lines
129
Suppose we need to calculate for
Set 2 main memory block j = 9 then,
255 155 a) (j mod v)* k to (j mod v)* k+(k- 1)
3 =(9 mod 3) *5 to (9 mod 3) * 5+(5- 1)
Set 3 159 = 0 to 4
b) j (mod v) to j (mod v) + (k - 1)
63
= 11 (mod 3) to 11(mod 3) + 4
0 mod 4 = 0 (set 0) = 2 to 6
255 mod 4 = 3 (set 3) c) j (mod k) to j (mod k) + (v - 1)
Like this the sets in the above table = 11 (mod 5) to 11 (mod 5) + 2
are determined for the other values. = 6 to 8
(a) is correct. gives correct answer.
Q.21 (c)
As already given in question, Q.25 (b)
Memory access time for 11 = 2 ns RAM chip size =1k × 8[1024 words
Memory access time for 12 = 20 ns of 8 bits each]
Now the required time pf transfer ... RAM to construct =16k ×16
= 20 + 2 = 22 ns 16k 16
Number of chips required =
lk 8
Q.22 (a) =16 × 2 [16 chips vertically with
As given memory access time for, each having 2 chips horizontally]
main memory = 200 ns So to select one chip out of 16
Memory access time for L2 = 2 ns vertical chips, we need 4 x 16
Memory access time for L2 = 20 ns decoder.
Total access time = Block transfer Available decoder is – 2 × 4 decoder
time from main memory to t2 cache To be constructed is 4 × 16 decoder
+ Access time of L2 + Access time of L1
Now, the required time of transfer
= 200 + 20 + 2 = 222 ns
Q.23 (d)
Number of clocks in cache
8 210 8 210
2 28
32 8 2
8 bits are needed to identify each
block.
Size of block is 32 bytes Thus 5 bit
wi 11 be needed to identifies the
lines.
So we need 5, 2 × 4 decoder in total
Q.24 (b) to construct 4 × 16 decoder.
Considering the following such
cases that the cache blocks are Q.26 (a)
arranged as follows.
74
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.27 (20) Memory operand Read operations =
Physical address size = 32 bits 90%(60)*1ns +10%(60)×5ns
Cache size =16k bytes =214 Bytes = 54ns + 30ns = 84ms
block size = 8 words 8×4 Byte= 32 Memory operands write operation
Bytes time = 90%(40)*2ns
(where each word= b Bytes) +10%(40)*10ns
214 = 72ns + 40ns =112ns
No. of blocks = 5 29 Total time taken for executing 200
2
block offset =9bits instructions =140 + 84 +112 =
336ns
∴ Average memory access time
336ns
1.68ns
200
29
No. of sets= =27
4 Q.31 (14)
set ofset =7 bits Average read access time = [(0.8)
Byte offset =8× 4 Bytes =32 Byte= 25 (5) +(0.2 )(50)] ns.
=5 bits = 4 + 10 = 14ns
TAG =32 – (7+ 5) = 20 bits
Q.32 (a)
Q.28 (d)
When a cache block size is smaller, it
TAG cache word TAG cache word
could accommodate more number
block offset Block offset
of blocks, it improves the hit ratio
for cache, so the miss penalty for
Q.33 (31)
cache will be lowered.
Memory size=4GB=232bytes
Word size=2 bytes
Q.29 (d)
Memorysize
When associativity is doubled, then ∴ No.of Address bits =
the set offset will be effected, Word size
32
accordingly, the number of bits used 2 bytes
231 31bits
for TAG comparator be 2bytes
effected.Width of set index decoder
also will be effected when set offset Q.34 (24)
is changed. Width of wag selection
multiplexer wil be effected when the
block offset is changed. With of
processor to main memory data bus Tag bits = 40 − (19 − 3) = 24 bits
is guaranteed to be NOT affected.
Q.35 (30)
Q.30 (1.68)
Total instruction = 10 instruction +
fetch operation + 60 memory
operand read operation
+40memory operand write op
= 200 instructions (operations)
Time taken for fetching 100
instructions (equivalent to read)
= 90*1ns +10*5ns =140ns
75
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2 INSTRUCTIONS-PIPELINING & ADDRESSING MODES
76
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
RRC A # 1; right rotate A through carry by
one bit. Thus : if the initial values of A and Instruction Operation Instruction
the carry flag are A7.....A0 and C0 size
respectively , their values after the (in words)
execution of this Instruction will be C0 MOV R1, 5000; R1← Memory[5000] 2
A7.....A1 & A0 respectively. MOV R2 , R1; R2← Memory [R1] 1
ADD R2 , R3; R2←R2 +R3 1
JCY; jump to Y if carry flag is set
MOV 6000, R2; Memory[6000]←R2 2
JMPZ; jump to Z HALT; Machine halts 1
Y: ADD B # 1; B←B + 1 [GATE-2006]
JMP Z ; jump to Z
X: Q.11 Consider that the memory is byte
addressable with size 32 bit, and the
Q.8 If the initial value of register A is A0, program has been loaded starting
the value of register B after the from memory location 1000
program execution will be (decimal) . If an interrupt occurs
a) the number of 0 bit in A0 while the CPU has been halted after
b) the number of 1 bit in A0 executing the HALT Instruction, the
c) A0 return address (in decimal) saved in
d) B the stack will be
[GATE-2006] a) 1007 b) 1020
c) 1024 d) 1028
Q.9 Which of the following instructions [GATE-2006]
when inserted at location X will
ensure that the Value of register A Q.12 Let the clock cycles required for
after program execution is the same various Operations be as follows :
as its initial value? Register to/from memory transfer
a) RRC A # 1 : 3 clock cycles
b) NOP ; no operation Add with both operands in register :
c) LRC A # 1 ; left rotate A through 1 clock cycle
carry flag by one bit Instruction fetch and decode
d) ADD A # 1 : 2 clock cycles per word
[GATE-2006] The total number of clock cycles
required to execute the program is
Q.10 A 4-state pipeline has the stage a) 29 b) 24
delays as 150, 120, 160 and 140 ns c) 23 d) 20
respectively. Registers that are used [GATE-2006]
between the stages have a delay of 5 Statements for Linked Questions
ns each. Assuming constant clocking no 13 and 14
rate, the total time taken to process Consider the following data path of
1000 data items on this pipeline will a CPU:
be
a) 120.4 μs b)160.5 μs
c) 165.5 μs d) 590.0 μs
[GATE-2006]
77
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
All operations including L2 : A R 0 ,R 0 ,R 0 <= R 0 R 0
incrementation of the PC and the L3 : S R 2 , R 0 , R 2 <= R 2 R 0
GPRs are to be carried out in the ALU.
Two clock cycles are needed for Let each state takes on clock cycle.
memory read operation, the first What is the number of clock cycles
taken to complete the above sequence
one for loading address in the MAR
of instructions starting from the
and the next one for loading data
fetch of l1?
from the memory bus into the MDR.
a) 8 b) 10
[GATE-2006]
c) 12 d) 15
[GATE-2006]
Q.13 The instruction "ADD Ro, R1" has the
register transfer interpretation R0
Q.16 Consider a three word machine
<=R0+R1. The minimum number of
instruction :
clock cycles needed for execution
ADD A[R0] , @ B
cycle of this instruction is
The first operand (destination) "A
a) 2 b) 3
[R0]" uses indexed addressing mode
c) 4 d) 5
with Ro as the index register. The
[GATE-2006]
second operand (source) “@B” uses
indirect addressing mode. A and B
Q.14 The instruction "CALL 2n , sub" is a
are memory addresses residing at the
two word instruction. Assuming that
second and the third words,
PC is incremented during the fetch
respectively. The first word of the
cycle of the first word of the
instruction specifies the opcode , the
instruction, its register transfer
index register designation and the
interpretation is
source and destination addressing
Rn <= PC + 1
modes. During execution of ADD
PC <=M [PC]
instruction, the two operands are
The minimum number of CPU clock
added and stored in the destination
cycles needed during the execution
(first operand).
cycle of this instruction is
The number of memory cycles
a) 2 b) 3
needed during the execution cycle of
c) 4 d) 5
the instruction is
[GATE-2006]
a) 3 b) 4
c) 5 d) 6
Q.15 A 5 stage pipelined CPU has the
[GATE-2006]
following sequence of stages.
IF: Instruction fetch from
Q.17 Match each of the high level
instruction memory
language statements given on the
RD: Instruction decode and register
read left hand side with the most natural
EX: Execute : ALU operation for data addressing mode from those listed
and address computation on the right hand side.
List I List II
MA : Data memory access : for write
P. A[l] = B[J] 1. Indirect addressing
access, the register read at RD stage
is used Q. while ( A + +) ;
* 2. Indexed addressing
WB : Register write back R. int temp = x ;* 3. Auto increment
Consider the following sequence of a) P-3, Q.2 , R-1 b) P-1, Q.2 , R-3
instructions: c) P-2 , Q.4 , R-1 d) P-1, Q.2 , R-3
L1 : L R0 , loc 1 ; R0 <=M [loc1] [GATE-2006]
78
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.18 A CPU has a five-stage pipeline and Common Data for Questions no 20, 21
runs at 1 GHz frequency. Instruction and 22
fetch happens in the first stage of Consider the following program segments
the pipeline. A conditional branch Here R1, R2 and R3 are the general purpose
instruction computes the target registers.
address and evaluates the condition Instruction Operation
Instruction size
in the third stage of the pipeline. The (Number of
words)
processor stops fetching new
MOV R1 3000 R1←M[3000] 2
instructions following a conditional LOOP: MOV R2, R3 R2←M [R3] 1
branch until the branch outcome is ADD R2, R1 R2←R1 + R2 1
known. Program executes 109 MOV R3, R2M[R3] ←R2 1
instructions out of which 20% are INC R3, R3 ←R3 + 1 1
conditional Branches. If each DEC R1, R1 ←R1-1 1
instruction takes one cycle to BNZ LOOP Branch on not zero 2
HALT Stop 1
complete on average, the total
Assume that the content of memory
execution time of the program is
location 3000 is 10 and the content of the
a) 1.0 s b) 1.2 s
register R3 is 2000. The content of each of
c) 1.4 s d) 1.6 s
the memory locations from 2000 to 2010 is
[GATE-2006]
100. The program is loaded from the
Q.19 Consider a new instruction named memory location 1000. All the numbers are
branch-on-bit-set (mnemonic bbs). in decimal.
The instruction "bbs reg , pos, label"
Q.20 Assume that the memory is word
jumps to label if bit in position pos
addressable. The number of memory
of register operand reg is one. A
references for accessing the data in
register is 32 Bit wide and the bits
executing the program completely is
are numbered 0 to 32, bit in position
a) 10 b) 11
0 being the least significant.
c) 20 d) 21
Consider the following emulation of
[GATE-2007]
this instruction on a processor that
does not have bbs implemented. Q.21 Assume that the memory is word
tem← reg and mask Branch to lable addressable. After the execution of
if temp is non-zero The variable this program, the content of memory
temp is a temporary register. For location 2010 is
correct emulation, the variable mask a) 100 b) 101
must he generated by c) 102 d) 110
a) mask← 0 x 1 << pos [GATE-2007]
b) mask ← 0 x ffffffff >> pos
c) mask ← pos Q.22 Assume that the memory is byte
d) mask ← 0 x f addressable and the word size is 32
[GATE-2006] bit. If an interrupt occurs during the
execution of the instruction INC R3 ,
Q.20 CPU has 24-bit instructions. A what return address will be pushed
program starts at address 300 (in on to the stack?
decimal). Which one of the following a) 1005 b) 1020
is a legal program counter (all values c) 1024 d) 1040
in decimal)? [GATE-2007]
a) 400 b) 5OO Q.23 Consider a pipelined processor with
c) 600 d) 700 the following four stages
[GATE-2007]
79
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
IF : Instruction Fetch L2 : SUB R4 R5 R6
ID: Instruction Decode and Operand L3 : ADD R1 R2 + R3
Fetch
L4 : STORE Memory [R4] R1
EX : Execute
WB: Write Back an BRANCH to Label if R1==0
The IF, ID and WB stages take 1 Which of the instructions l1 , l2 , l3 , or
clock cycle each to complete the l4 can legitimately occupy the delay
operation. The number of clock slot without any other program
cycles for the EX stage depends on modification?
the instructions. The ADD and SUB a) l1 b) l2
instructions need 1 clock cycle and c) l3 d) l4
the MUL instruction needs 3 clock [GATE-2008]
cycles in the EX stage. Operand
forwarding is used in the pipelined Q.26 Which of the following must be true
processor. for the RFE (Return from Exception)
What is the number of clock cycles instruction on a general purpose
taken to complete the following processor?
sequence of instructions? 1. it must be a TRAP instruction.
ADD R 2 , R1 , R 0 R 2 R1 R 0 2. it must be a privileged instruction.
3. An exception cannot be allowed to
MUL R 4 , R3, R 2 R 4 R 3 *R 2 occur during execution of an REE
SUB R6 , R5 , R 4 R6 R5 R 4 instruction.
a) 7 b) 8 a) 1 only b) 2 only
c) 10 d) 14 c) 1 and 2 d) 1, 2 and 3
[GATE-2007] [GATE-2008]
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
each for any instruction. The PO 20 and the contents of register R2.
stage takes 1 clock cycle for ADD Which of the following best reflects
and SUB instructions, 3 clock cycles the addressing mode implemented
for MUL instruction, and 6 clock by this instruction for the operand
cycles for DIV instruction in memory?
respectively. Operand forwarding is a) Immediate addressing
used in the pipeline. What is the b) Register addressing
number of clock cycles needed to c)Register indirect called addressing
execute the following sequence of d) Base indexed addressing
instructions? [GATE-2011]
Instruction Meaning of
Q.31 Registered renaming is done in
instruction
pipelined processors
l0 : MULR 2 , R 0 , R1 R 2 R 0 *R1
a) as an alternate to register
l1 : DIVR 5 , R 3 , R 4 R5 R3 / R 4 allocation at compile time
l1 : ADDR 2 , R 3 , R 2 R 2 R5 R 2 b) for efficient access to function
parameters and local variables
l1 : SUBR 5 , R 2 , R 6 R5 R 2 R6 c) to handle certain kinds of
a)3.4 b) 4.4 hazards
c) 5.1 d) 6.7 d) as part of address translation
[GATE-2010] [GATE-2012]
Q.29 Consider an instruction pipeline Q.32 Consider the following sequence of
with four stages (S1, S2, S3 and S4) micro-operations.
each with combinational circuit MBR←PC
only. The pipeline registers are MAR←X
required between each stage and at PC ← Y
the end of the last stage. Delays for Memory ←MBR
the stages and for the pipeline Which one of the following is a
registers are as given in the figure possible operation performed by
this sequence?
a) Instruction fetch
b) Operand fetch
c) Conditional branch
What is the approximate speed up of
d) Initiation of interrupt service
the pipeline in steady state under
[GATE-2013]
ideal conditions when compared to
the corresponding non-pipeline
Q.33 Consider an instruction pipeline
Implementation?
with five stages without any branch
a) 4.0 b) 2.5
prediction. Fetch Instruction (FI),
c) 1.1 d) 3-0
Decode Instruction (DI), Fetch
[GATE-2011]
Operand (FO), Execute Instruction
Q.30 Consider a hypothetical processor (EI) and Write Operand (WO). The
with an instruction of type LW (R1), stage delays for FI, DI, FO, EI and
20 (R2). WO are 5 ns, 7 ns, 10 ns, 8 ns and 6
Which during execution reads a 32- ns, respectively. There are
bit word from memory and stores it intermediate storage buffers after
in a 32 bit registers R1. The effective each stage and the delay of each
address of the memory location is buffer is 1 ns. A program consisting
obtained by the addition of constant of 12 instructions I1 , I2 , I3 ,.I12 is
81
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
executed in this pipelined processor. have zero latency.
Instruction I4 is the only branch P1:Four-stage pipeline with stage
instruction and its branch target is latencies 1 ns, 2 ns, 2 ns, 1 ns.
𝐼9 . If the branch is taken during the P2:Four-stage pipeline with stage
execution of this program, the time latencies 1 ns,1.5 ns,1.5 ns,1.5 ns.
(in ns) needed to complete the P3:Five-stage pipeline with stage
program is latencies0.5ns,1ns,1ns,0.6ns,1ns.
a) 132 b) 185 P4:Five-stage pipeline with stage
c) 176 d) 328 latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns,
[GATE-2013] 1.1 ns.
Which processor has the highest
Q.34 A machine has a 32-bit architecture, peak clock frequency?
with 1-word long instructions. It has a) P1 b) P2
64 registers, each of which is 32 bits c) P3 d) P4
long. It needs to support 45 [GATE-2014-3]
instructions, which have an
immediate operand in addition to Q.38 An instruction pipeline has five
two register operands. Assuming stages, namely, instruction fetch
that the immediate operand is an (IF), instruction decode and register
unsigned integer, the maximum value
fetch (ID/RF), instruction execution
of the immediate operand is ______.
(EX), memory access (MEM), and
[GATE-2014-1]
register write back (WB) with stage
Q.35 Consider a 6-stage instruction latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and
pipeline, where all stages are 0.75 ns, respectively (ns stands for
perfectly balanced. Assume that nanoseconds). To gain in terms of
there is no cycle-time overhead of frequency, the designers have decided
pipelining. When an application is to split the ID/RF stage into three
executing on this 6-stage pipeline, stages (ID, RF1, RF2) each of latency
the speedup achieved with respect 2.2/3 ns. Also, the EX stage is split
to non-pipelined execution if 25% of into two stages (EX1, EX2) each of
the instructions incur 2 pipeline latency 1 ns. The new design has a
stall cycles is___________. total of eight pipeline stages. A
[GATE-2014-1] program has 20% branch
instructions which execute in the EX
Q.36 Consider two processors P1 and P2 stage and produce the next
executing the same instruction set. instruction pointer at the end of the
Assume that under identical EX stage in the old design and at the
conditions, for the same input, a end of the EX2 stage in the new
program running on P2 takes 25% design. The IF stage stalls after
less time but incurs 20% more CPI fetching a branch instruction until
(clock cycles per instruction) as the next instruction pointer is
compared to the program running computed. All instructions other
on P1. If the clock frequency of P1 is than the branch instruction have an
1GHz, then the clock frequency of P2 average CPI of one in both the
(in GHz) is __________. designs. The execution times of this
[GATE-2014-1] program on the old and the new
Q.37 Consider the following processors design are P and Q nanoseconds,
(ns stands for nanoseconds). respectively. The value of P/Q is ____.
Assume that the pipeline registers [GATE-2014-3]
82
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.39 For computers based on three- number of clock cycles taken for the
address instruction formats, each execution of the above sequence of
address field can be used to specify instructions is __________
which of the following: [GATE-2015-2]
S1: A memory operand
S2: A processor register Q.42 Consider a processor with byte-
S3: An implied accumulator register addressable memory. Assume that
a) Either S1 or S2 b) Either S2 or S3 all registers, including Program
c) Only S2 and S3 d)All of S1,S2&S3 Counter (PC) and Program Status
[GATE-2015-1] Word (PSW), are of size 2 bytes. A
stack in the main memory is
Q.40 Consider a non-pipelined processor implemented from memory location
with a clock rate of 2.5 gigahertz and (0100)16 and it grows upward. The
average cycles per instruction of stack pointer (SP) points to the top
four. The same processor is element of the stack. The current
upgraded to a pipelined processor value of SP is (016E)16 . The CALL
with five stages; but due to the instruction is of two words, the first
internal pipeline delay, the clock word is the op-code and the second
speed is reduced to 2 gigahertz. word is the starting address of the
Assume that there are no stalls in subroutine (one word 2bytes). The
the pipeline. The speed up achieved CALL instruction is implemented as
in this pipelined processor is _______. follows:
[GATE-2015-1] Store the current Vale of PC in
the Stack
Q.41 Consider the sequence of machine Store the value of PSW register
instruction given below: in the stack
MUL R5, R0, R1 DIV R6, R2, R3 Load the starting address of the
ADD R7, R5, R6 SUB R8, R7, R4 subroutine in PC
In the above sequence, R0 to R8 are The content of PC just before the
general purpose registers. In the fetch of a CALL instruction is
instructions shown. The first register (5FA0)16. After execution of the
stores the result of the operation CALL instruction, the value of the
performed on the second and the stack pointer is
third registers. This sequence of a) (016A)16 b) (016C)16
instructions is to be executed in a c) (0170)16 d) (0172)16
pipelined instruction processor with [GATE-2015-2]
the following 4 stages (1)
Instruction Fetch and Decode (IF), Q.43 Consider the following code
(2) Operand Fetch (OF), (3) Perform sequence having five instructions I1
Operation (PO) and (4) Write back to I5. Each of these instructions has
the result (WB). The IF,OF and WB the following format.
stages take OP Ri, Rj, Rk
1 clock cycle each for any Where operation OP is performed
instruction The PO stage takes 1 on contents of registers Rj and Rk
clock cycle for ADD or SUB and the results is stored in register
instruction, 3 clock cycles for MUL Ri.
instruction and 5 clock cycles for DIV I1 : ADD R1, R2, R3
instruction. The pipelined processor I2 : MUL R7, R1, R3
uses operand forwarding from the I3 : SUB R4, R1, R5
PO stage to the OF stage. The I4 : ADD R3, R2, R4
83
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
I5 : MUL R7,R8, R9 Q.46 Consider a processor with 64
Consider the following three registers and an instruction set of
statements. size twelve. Each instruction has five
S1:There is an anti-dependence distinct fields, namely, opcode, two
between instructions I2 and I5 source register identifiers, one
S2:There is an anti-dependence destination register identifier, and a
between instructions I2 and I4 twelve-bit immediate value. Each
S3: Within an instruction pipeline instruction must be stored in memory
an anti-dependence always in a byte-aligned fashion. If a program
creates on or more stalls has 100 instructions, the amount of
Which one of above stamens is/are memory (in bytes) consumed by the
correct? program text is ______.
a) Only S1 is true [GATE-2016-2]
b) Only S2 is true
Q.47 Consider a 3 GHz (gigahertz)
c) Only S1 and S3 are true
processor with a three-stage pipeline
d) Only S2 and S3 are true
and stage latencies τ1,
[GATE-2015-3]
τ2, and τ3 such that τ1 = 3τ2/4 = 2τ3.
Q.44 The stage delays in a 4-stage If the longest pipeline stage is split
pipeline are 800, 500, 400 and 300 into two pipeline stages of equal
picoseconds. The first stage (with latency, the new frequency is ________
delay 800 picoseconds) is replaced GHz, ignoring delays in the pipeline
with a functionally equivalent design registers.
involving two stages with respective [GATE-2016-2]
delays 600 and 350 picoseconds.
Q.48 Suppose the functions F and G can
The throughput increase of the
be computed in 5 and 3
pipeline is _____ percent.
nanoseconds by functional units UF
[GATE-2016-1]
and UG, respectively. Given two
Q.45 A processor has 40 distinct instances of UF and two instances of
instructions and 24 general purpose UG, it is required to implement the
registers. A 32-bit instruction word computation F (G (Xi)) for 1 ≤ i ≤ 10.
has an opcode, two register operands Ignoring all other delays, the
and an immediate operand. The minimum time required to complete
number of bits available for the this computation is ______
immediate operand field is _________. nanoseconds.
[GATE-2016-2] [GATE-2016-2]
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(c) (b) (c) (b) (c) (d) (d) (a) (a) (c) (a) (b) (a) (b)
15 16 17 18 19 20 21 22 23 24 25 26 27 28
(a) (c) (a) (b) (a) (a) (d) (a) (a) (b) (b) (a) (d) (d)
29 30 31 32 33 34 35 36 37 38 39 40 41 42
(b) (b) (d) (c) (d) (b) 16383 4 1.6 (c) 1.54 (a) 3.2 13
43 44 45 46 47 48
(d) 33.33 16 500 4 28
84
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q.1 (c) absolute address referred to by a
Indirect addressing— Pointers block of instruction. In this way the
Immediate addressing —Constants processor is able to move the entire
Auto decrement addressing block from one region of main
— Loops memory to other.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.9 (a) 6+2=8
The value of register A will remain 1+ 1 = 2
same when the instruction RRC A, # Thus, the total comes out to be 24.
is inserted. As this statement does
not affect the value stored in the Q.13 (a)
register. Given that R0 R0 + R1
The clock cycles operate as follows :
Q.10 (c) Cycle 1
In pipeline total delay (150+ Out : R1
120+160+ 140) =570 And we know In : S
that the formula for a k stage Cycle 2 .
pipeline which can process n tasks Out : R2
in T k time is In : T
T k = [k + (n - 1)] t Cycle 3
Maximum delay tm is of 160. Out : S, T
Therefore, t =160 + 5 =165 ns As k Add: ALU'
= 4 and n =1000 . In : R
Therefore, T= [4 + (1000 - 1 )]t Therefore, execution cycle is
= 165.5 μs completed in 3 clock cycles.
Q.11 (a) Q.14 (b)
The following table gives the As given
instruction and its size and also the Rn < = PC +1;
location in decimal which it PC <= M[PC]
occupies. The clock cycles operate as follows :
Instruction Instruction Location Cycle 1
size Out : PC
MOV R1, 1000 2 1000 to 1007 In : S, MAR
MOV R2, R1 1 1008 to 1011
Cycle 2
ADD R2, R3 1 1012 to 1015
MOV 1005, R2 2 1016 to 1023
Out : S
HALT 1 1024 to 1027 Increment : ALU
The return address saved on the In : Rn
stack is 1024 when an interrupt Cycle 3
occurs after executing the halt Out : MDR
statement the CPU gets halted. In : PC
Therefore, execution cycle is
Q.12 (b) completed in 3 clock cycles.
86
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Thus, total number of clock cycles required. Thus, is the option mask
required = 10 0 × 1 < < pos is correct.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Thus all the three statements are
Q.24 (b) true as far as RFE is concerned.
As given the pipelined processor has
four stages, i.e. IF, ID, EX, WB. Q.28 (d)
And we know that number of clock When i =1
cycles required to ADD and SUB Number of cycles needed to execute
instructions is 1 and by MUL the given loop
instructions are 3. = 2 + 1 + 3 + 2+2 + 3 + 2=15
In the pipelined processor while one Thus total cycles required=2×15=30
instruction is fetched, the other is
either being decoded or executed or Q. 29 (b)
some action is being performed. As per the given, the instructions are
Thus, the number of cycles required arranged accordingly to their
by the given set of instructions can meanings. We get the following:
be obtained from the following
diagram
88
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Memory ← MBR instead of 55 ns because time for
Analysis fetching I9 can be overlap with WO
1. First micro operations stores the of I4.
value of PC into Memory Base ∴ Total Time is = 88 + 88 – 11 = 165
Register (MBR) ns
2. Second micro operations stores Q.35 (16383)
the value of X into Memory Address 1 Word = 32 bits
Register (MAR) Each instruction has 32 bits
3. Third micro operation stores To support 45 instructions, opcode
value of Y into PC. must contain 6-bits
4. Fourth micro operation stores Register operand1 requires 6 bits,
value of MBR to memory. since the total registers are 64.
So before execution of these Register operand 2 also requires 6
instructions PC holds the value of bits
next instruction to be executed. We
first stores the value of PC to MBR
and then through MBR to memory
i.e., We are saving the value of PC in
memory and then load PC with a
14-bits are left over for immediate
new value. This can be done only in
Operand Using 14-bits, we can give
two types. Operations Conditional
maximum 16383,
branch and interrupt service. As we
Since 214 = 16384 (from 0 to
are not checking here for any
16383)
conditions . So, it is an initiation of
interrupt service.
Q.36 (4)
For 6 stages, non- pipelining takes 6
Q.34 (b)
cycles.
Instruction pipeline with five stages
There were 2 stall cycles for
without any branch prediction:
pipelining for 25% of the
Delays for FI , DI , FO , EI and WO are
instructions
5,7,10,8,6 ns respectively.
25 3
The maximum time taken by any So pipe line time = 1 2 = =
stage is 10 ns and additional 1 ns is 100 2
required for delay of buffer. is 10 ns 1.5
and additional Non pipeline time
Speed up =
∴ The total time for an instruction to Pipeline time
pass from one stage to another in 11 6
ns. 4
1.5
The instructions are executed in the
following order
Q.37 (1.6)
I1 , I2 , I3 , I4 , I9 , I10 , I11 , I12
109
Execution with Time 1 cycle time for p1 = 1n.s
Now when I4 is in its execution stage 1GH
we detect the branch and when I4 is Assume p1 takes 5 cycles for a
in WO stage we fetch I9 So time for program then p2 takes 20% more,
execution of instructions from I9 to means, 6 cycles.
I12 is = 11 * 5 + (4 - 1) * 11 = 88 ns. p2 Takes 25% less time, means, if p1
But we save 11 ns when fetching I9 takes 5 n.s, then p2 takes 3.75 n.s.
.i.e., I9 requires only 44 ns additional Assume p2 clock frequency is x GHz.
89
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
6 109 Q.43 (d)
p2 Taken 6 cycles, so I1) R1←R2+R3
x GH
3.75, x 1.6 I2) R7←R1×R3
I3) R4←R1-R5
I4) R3←R2+R4
Q.38 (C)
I5) R7←R8+R9
Clock period (CP) = max stage delay
Anti dependence
+ overhead
i) -------- = x
So CPP1 = Max(1,2,2,1) = 2ns
j) X: -------
CPP2 = Max(1,1.5,1.5,1.5) = 1.5ns
Then i and j are anti - dependence
CPP3 = Max(0.5,1,1,0.6,1) = 1ns
Hence I2 and I4 are anti-dependence
CPP1 = Max(0.5,0.5,1,1,1.1) = 1.1ns
1 ⇒ Anti-dependence create stall in
As frequency α C.P , so least clock pipeline
period will give the highest peak
1 Q.44 (33.33)
clock frequency. So, fp3 = 1GHz
1ns Old design tp= 800
New design tp = 600
Q.39 (1.54) Throughput
800 600
No. of Stall Stall Clock Avg. 100% 33.33%
stages cycle frequency period access 600
time
Old 5 2 20% 2.2ns P Q.45 (16)
design
New 8 5 20% 1 ns Q
design
Q.42 (13)
90
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
3 CPU CONTROL DESIGN & INTERFACES
Q.1 Which is the most appropriate a) test the interrupt system of the
match for the items in the first processor
column with the items in the second b) implement co-routines
column: c) obtain system services which
List I List II need execution of privileged
P.Indirect addressing 1.Array instruction
implementation d) return from subroutine
Q. Indexed addressing 2. Writing relocatable [GATE-2002]
code
R. Base register 3. Passing array as
addressing parameter Q.4 In the absolute addressing mode
a) P-3 , Q.1 , R-2 b) P-2, Q.3 , R-1 a) the operand is inside the
c) P-3 , Q.2 , R-1 d) P-1 , Q.3, R-2 instruction
[GATE-2001] b) the address of the operand is
inside the instruction
Q.2 Consider the following data path of a c) the register containing the
simple non-pipelined CPU. The address of the operand is
registers A, B, A1, A2, MDR, the bus specified inside the instruction
and the ALU are 8-bit wide, SP and d) the location of the operand is
MAR are 16-bit registers. The MUX implicit
is of size 8 X (2:1) and the DEMUX is [GATE-2002]
of size 8 X (1:2). Each memory
operation takes 2 CPU clock cycles Q.5 A device employing INTR line for
and uses MAR (Memory Address device interrupt puts the CALL
Register) and MDR (Memory Date instruction on the data bus while
Register). SP can be decremented a) INTA is active
locally. b) HOLD is active
c) READY is active
d) None of these
[GATE-2002]
91
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
b) Vectored interrupts are not additional ALU is required for
possible but multiple effective address calculation.
interrupting devices are possible 3. The amount of increment
c) Vectored interrupts and multiple depends on the size of the data
interrupting devices are item accessed.
bothpossible a) 1 only b) 2 only
d) Vectored interrupt is possible but c) 3 only d) 2 and 3
multiple interrupting devices are [GATE-2008]
not possible
[GATE-2005] Q.11 A CPU generally handles an
interrupt by executing an interrupt
Q.8 Normally user programs are service routine
prevented from handling I/O a) as soon as an interrupt is raised
directly by I/O instructions in them. b) by checking the interrupt
For CPUs having explicit I/0 register at the end of fetch cycle
instructions, such I/O protection is c) by checking the interrupt
ensured by having the I/0 register after finishing the
instructions privileged. In a CPU execution of the current
with memory mapped I/O, there is instruction
no explicit I/O instruction. Which d) by checking the interrupt
one of the following is true for a CPU register at fixed time intervals
with memory mapped I/0? [GATE-2009]
a) I/O protection is ensured By
operating system routine(s) Q.12 On a non-pipelined sequential
b) I/0 protection is ensured by a processor, a program segment,
hardware trap which is a part of time interrupt
c) I/0 protection is ensured during service routine, is given to transfer
system configuration 500 byte from an I/O device to
d) I/O protection is not possible memory.
[GATE-2005] Initialize the address register
Initialize the count to 500
Q.9 Horizontal micro-programming LOOP: Load a byte from device
a) does not require use of signal Store in memory at address given by
decodes address register
b) results in larger sized micro- Increment the address register
instructions than vertical micro- Decrement the count
programming If count ! =O go to LOOP
c) uses one bit for each control Assume that each statement in this
signal program is equivalent to a machine
d) All of the above instruction which takes one clock
[GATE-2006] cycle to execute if it is a non-
load/store instruction. The load-
Q.10 Which of the following is/are true of store instructions take two clock
the auto-increment addressing cycles to execute. The designer of
mode? the system also has an alternate
1. It is useful in creating self- approach of using the DMA
relocating code. controller to implement the same
2. If it is included in an Instruction transfer. The DMA controller
Set Architecture, then an required 20 clock cycles for
92
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
initialization and other overheads. Q.14 Consider a main memory system
Each DMA transfer cycle takes two that consists of 8 memory modules
clock cycles to transfer one byte of attached to the system bus, which is
data from the device to the memory. one word wide. When a write
What is the approximate speed up request is made, the bus is occupied
when the DMA controller based for 100 nanoseconds (ns) by the
design is used in place of the data, address, and control signals.
interrupt driven program based During the same 100 ns, and for 500
input-output? ns thereafter, the addressed memory
a) 3.4 b) 4.4 module executes one cycle accepting
c) 5.1 d) 6.7 and storing the data. The (internal)
[GATE-2011] operation of different memory
modules may overlap in time, but
Q.13 The amount of ROM needed to only one request can be on the bus
implement a 4 bit multiplier is at any time. The maximum number
a) 64 bit b) 128 bit of stores (of one word each) that can
c) 1 kbit d) 2 kbit be initiated in 1 millisecond is ______
[GATE-2012] [GATE-2014-2]
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(a) (b) (c) (d) (a) (c) (b) (a) (c) (c) (c) (a) (d) 10000
93
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q.1 (a) An array (nx n) containing 2n-1 gate
X. Indirect addressing Passing cells . If each unit cell contains a θ
array as parameter (1) then the total delay comes out to
Y. Indexed addressing array be of (2/7-1) θ(1) which
implementation corresponds to θ (n).
Z. Base register addressing Q.7 (b)
Writing relocatable code In single line interrupt system,
vectored interrupts are not possible
Q.2 (b) but multiple interrupting devices
From the given data it can be are possible as a single line
determined that the number of CPU interrupt system consists of a single
clock cyles required to execute the interrupt request line and a
"push r" instruction is 3. interrupt grant line in such a system
it may be possible that at the same
Q.3 (c) time more than one output devices
Some of the operations in the can request an interrupt, thus in
system can be assigned to a mode such cases only one request will be
called supervisor mode only. granted according to the priority as
Software interrupt is an interrupt is depicted by the following figure,
that is expected with the help of but the interrupt is granted to the
some instructions which are then single request only;
executed. It can be used to interrupt
a procedure at any desired location
and is most importantly associated
with a supervisor call which
provides the ability to sustain from
a CPU user mode to the supervisor
mode.
Q.8 (a)
Q.4 (d) To find the solution, following are
the points to keep in mind :
Q.5 (a) 1. An I/O port assigned to memory
When INTR is high, an interrupt is cannot be assigned to an address bit
enabled and the micro-processor pattern and vice-versa..
completes the current instruction 2. Memory-mapped I/O requires
and disables the interrupt, enables that the same set of addresses is
the flip-flop and simultaneously shared by the memory locations and
sends an acknowledgement on INTA I/O ports.
which is active low, telling that an Therefore, I/O protection is ensured
interrupt is being serviced and by" the operating system routine(s).
during this another interrupt cannot
occur until the interrupt flip-flop is Q.9 (c)
enabled again. Detection of concurrently
executable micro-operations is an
Q.6 (c) important consideration for effective
94
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
horizontal micro-programming. Since, Q.14 (10000)
it is highly machine dependent and Each write request, the bus is
requires knowledge of highly occupied for 100 n.s
intricate features of a machine, only Storing of data requires 100 n.s.
limited effort has been made so far In 100 n.s − 1 store
to drive an algorithm for micro- 100
program parallelism to enable n.s = 1 store
106
optimization of horizontal micro- 106
programs. Therefore, in the 1 m.s = stores
horizontal microprogramming, one 100
bit for each control signal is used. =10000 stores
Q.10 (c)
For incrementing the data, the auto-
increment addressing mode is used
which purely depends on the size of
the data. For example :
Regs [R1 ] Regs [R1] +Mem [Regs
[R2]
Regs[R2] Regs[R2] +d
d is the size of the data that is being
accessed.
Q.11 (c)
The interrupt register is checked
after finishing the execution of the
current instruction. At this time, a
CPU generally handles an interrupt
by the execution of an interrupt
service routine.
Q.12 (a)
Number of clock cycles required by
using load-store approach = 2 + 500
x 7 = 3 502 and that of by using
DMA = 20 +500 x 2=1020
Required speed up=3502/1020= 3.4
Q.13 (d)
The normal size of ROM is n × 2n
∴ Now, we are multiplying two n-bit
numbers.
So, the resultant has 2n bit.
Hence, the size of the ROM is 2n× 22n
In the question n = 4
Hence 2 × 4 × 22 × 4
8 ×28 23 × 28
2 × 210 2 k bit
95
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4 SECONDARY MEMORY & DMA
Q.1 What is the swap space in the disk consumed for the transfer operation?
used for? a) 5.0% b) 1.0%
a) Saving temporary HTML pages c) 0.5% d) 0.1%
b) Saving process data [GATE-2006]
c) Storing the super-block
Q.4 A device with data transfer rate 10
d) Storing device drivers
Kbyte/s is connected to a CPU. Data
[GATE-2005]
is transferred byte wise. Let
the interrupt overhead be 4 μs. The
Q.2 The micro-instruction stored in the
byte transfer time between the
control memory of a processor has a
device interfaces register and CPU
width of 26 bit. Each micro-
or memory is negligible. What is the
instruction is divided into three
minimum performance gain of
fields; a micro-operation field of 13
operating the device under
bit, a next address field (X), and a
interrupt mode over operating it
MUX select field (Y), there are 8
under program controlled mode?
status bits in the inputs of the MUX.
a) 15 b) 25
c) 35 d) 45
[GATE-2006]
Q.5 Consider a disk drive with the
following specifications:
16 surfaces, 512 tracks /surface,
512 sectors/track, 1 Kbyte/sector,
rotation speed 3000 rpm. The
disk is operated in cycle stealing
mode whereby whenever one 4
byte word is ready it is sent to
memory; similarly, for writing,
How many bits are there in the X
the disk interface reads a 4 byte
and Y fields, and what is the size of
word from the memory in each DMA
the control memory in number of
cycle.
words?
Memory cycle time is 40 ns. The
a) 10, 3, 1024 b) 8, 5, 256
maximum percentage of time that
c) 5, 8, 2048 d) 10, 3, 512
the CPU gets blocked during DMA
[GATE-2006]
operation is
a) 10 b) 25
Q.3 A hard disk with a transfer rate of
c) 40 d) 5O
10 Mbyte/s is constantly
[GATE-2006]
transferring data to memory using
DMA. The processor runs at 600 Q.6 Consider a disk pack with 16
MHZ, and takes 300 and 900 clock to surfaces of 128 tracks per surface
initiate and complete DMA transfer and 256 sectors per track. 512 byte
respectively. if the size of the of data are stored in a bit serial
transfer is 20 Kbyte, what is the manner in a sector. The capacity of
percentage of processor time the disk pack and the number of bits
96
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
required to specify a particular the disk and the starting disk
section in the disk are respectively. location of the file is <1200, 9, 40>.
a) 256 Mbyte, 19 bit What is the cylinder number of the
b) 256 Mbyte, 28 bit last sector of the file, if it is stored in
c) 512 Mbyte, 20 bit a contiguous manner?
d) 64 Gbyte, 28 bit a) 1281 b) 1282
[GATE-2007] c) 1283 d) 1284
[GATE-2013]
Q.7 For a magnetic disk with concentric
circular tracks, the seek latency is Q.11 Consider a disk pack with a seek
not linearly proportional to the seek time of 4 milliseconds and rotational
distance due to speed of 10000 rotations per minute
a) non-uniform distribution of (RPM). It has 600 sectors per track
requests and each sector can store 512 bytes
b) arm starting and stopping inertia of data. Consider a file stored in the
c) higher capacity of tracks on the disk. The file contains 2000 sectors.
periphery of the platter Assume that every sector access
d) use of unfair arm scheduling necessitates a seek, and the average
policies rotational latency for accessing each
[GATE-2008] sector is half of the time for one
complete rotation. The total time (in
Common Data for Questions 8 and 9
milliseconds) needed to read the
A hard disk has 63 sectors per track, 10
entire file is _________.
platters each with 2 recording surfaces and
[GATE-2015-1]
1000 cylinders. The address of a sector is
given as a triple (c, h, s), where c is the Q.12 Consider a typical disk that rotates
cylinder number, h is the surface number at 15000 rotations per minute
and s is the sector number. Thus, the 0th (RPM) and has a transfer rate of
sector is addressed as (0, 0, 0), the 1st 50×106 bytes/sec. if the average
sector as (0, 0, 1), and so on. seek time of the disk is twice the
average rotational delay and the
Q.8 The address <400, 16, 29 >
controller’s transfer time is 10 times
corresponds to sector number
the disk transfer time, the average
a) 505035 b) 505036
time (in milliseconds) to read or
c) 505037 d) 505038
write a 512-byte sector of the disk is
[GATE-2010]
__________.
Q.9 The address of the 1038th sector is [GATE-2015-2]
a) <0, 15, 31> b) <0, 16, 30>
Q.13 The size of the data count register of
c) <0, 16, 31> d) <0, 17, 31>
a DMA controller is 16 bits. The
[GATE-2010]
processor needs to transfer a file of
Q.10 Consider a hard disk with 16 29,154 kilobytes from disk to main
recording surfaces (0-15) having memory. The memory is byte
16384 cylinders (0-16383) and each Addressable . The minimum number
cylinder contains 64 sectors (0-63). of times the DMA controller needs to
Data storage capacity in each sector get the control of the system bus
is 512 bytes. Data are organized from the processor to transfer the
cylinder—wise and the addressing file from the disk to main memory
format is <cylinder no., sector no.>. is_____.
A file of size 42797 KB is stored in [GATE-2016-1]
97
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13
(b) (a) (a) (a) (b) (a) (c) (c) (c) (d) 14020 6.1 456
EXPLANATIONS
Q.1 (b) Single byte transfer with interference
Let us assume that CPU contains two mode is 4 𝜇 S. 10 kbyte/s is the data
processes when one process is being transfer rate.
executed on the CPU the other one is Net transfer done = 25 × 103
swapped out arid all the data is Whereas actual transfer rate takes
saved on the disk, and when the place at rate =10 4
other one is in progress then all the Therefore, the minimum
data of first process is saved on the performance gain =25
disk, Thus the swap space is
basically Used for saving the process Q.5 (b)
data. As given
Q.2 (a) Revolution per minute
The total size of control memory = 3000 rpm
processor's instruction is 26 bit = 50 revolutions per second (rps)
which is divided into three equal 512 kbyte of data can be read in one
parts of 13 bit each of micro- revolution. Number of tracks that
operation. And MUX has input of 8 can be read = 217
status bit. And in one second number of tracks
So, V, the select line field size is of 3 read =217 * 50
bit and the next address field size, X Time taken by each interrupt= 4 ns
becomes of 10 (13 - 3) bit. The size Thus, the interrupt, 6553600 takes =
of control memory obtained = 2^10 0.2621s
= 1024 Therefore, minimum percentage
gain = 0.2612/1 = 26%
Q.3 (a) Thus, the answer is 25.
The size of transfer = 20 kbyte (10 ×
2 ^ 10 kbyte) Q.6 (a)
Transfer rate of data = 10 Mbyte/s The formula used is
Therefore, Total disk size is given by = Number
10 *2 ^ 10 x% = 20 of surfaces x Number of tracks x
x = 20*100/10*210 Number of sectors x Capacity of
= 200/1024 each sector.
= 0.1% Therefore from the given data, we get
Total disk size =16 x 128 x 256 x
Q.4 (a) 512 byte
= 28 x 220 = 28 megabyte
= 256 MB
98
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Total number of sectors
= 16 x 128 x 256 byte 60s 10000rotations
= 219 byte
Rotation Tim
60
Q.7 (c) 6ms 1rotation
The seek latency is not linearly 10000
proportional to seek distance due to 1
∴ Rotational latency = 6ms 3ms
the higher capacity of tracks on the 2
periphery of the latter. The higher 1track → 600sectors
capacity of the tracks is responsible 6ms ←600 sectors (1 rotation
for the presence of the desired cell means 600 sectors (or) 1 track)
in the wrong part and because of 6ms
this certain amount of time is 1sector → 0.01ms
600
required for this cell to reach the
2000sector → 2000(0.01) = 20ms
read-write head sp that data
∴ total time needed to read the
transfer can take place. entire file is
= 2000 (4+3) +20
Q.8 (c)
=8000+6000+20 = 14020 ms
We have to find the sector number
of the address <400,16,29 > .
Q.12 (6.1)
Therefore, 400 *2* 10* 63 + 16* 63 60sec →15000 rotations
+ 29 = 505037 sector
60
4ms 1rotation
Q.9 (c) 15000
<0, 16, 13> this address corresponds Average rotational delay =
to a sector number which is given by 1
4 2ms
16 * 63 + 31 = 1039 2
As per question, average seek time =
Q.10 (d) 2 × Avg. rotational delay
42797 1024 = 2×2 = 4ms
42797 KB = = 85594
512 1sec 50 106 bytes
sectors 512 disk transfer time
0.01ms 512bytes
Starting is (1200, 9, 40) contains 50 106
total 24 + (6x64) = 408 sectors As per Question, controller’s
Next, 1201, -----, 1283 cylinders transfer time is =10×0.0 ms = 0.1 ms
contains total 1024 x 83 = 84992 Avg. Time = 4ms + 0.1 ms + 2 ms =
sectors 6.1 ms
(∵ each cylinder contains 16 x 64
=1024 sectors) Q.13 (456)
∴ Total=408+84992 = 85400 sectors 29154kB
∴ The required cylinder number is DMA controller needs ⇒
216 byte
(1284) which will contain the last
⇒ 455.53125 = 456
sector of the file
Q.11 14020
Given
Seek time = 4ms
60s →10000 rotations
99
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
ASSIGNMENT QUESTIONS
Q.1 From a given tautology, another Q.8 If X, Y and Z are 3 Boolean variable
tautology can be delivered by then X ( Y + Z) equals ( X + Y) ( X +
interchanging Z), if X, Y, Z take the values
a) 0 and 1 a) 1, 0, 0 b) 0, 1, 0
b) AND and OR c) 1, 1, 0 d) 0, 1, 1
c) 0 and 1; AND and OR
d) imposable to always derive Q.9 Which of the following comments
about the program Counter (PC) are
Q.2 Which of the following logical true?
operation produce a 0 if the inputs a) It is a register.
are 1, 1and 0? b) It is a cell in ROM.
a) OR c) During execution of the current
b) AND instruction, its content changes.
c) Exclusive-OR d) None of the above
d) Exclusive-NOR
Q.10 If (123)s = (A3)B’ then the number
Q.3 Choose the correct answer. of possible value of A is
If × is a Boolean variable, then a) 4 b) 1
a) 0 + x = x b) 1 ÷ x = x c) 3 d) 2
c) x + x = x d) x + x’ = 0
Q.11 The speed imbalance between
Q.4) If X, Y and Z are three Boolean memory access and CPU operation
variables then can be reduced by
a) X. X’ = 1 a) cache memory
b) X (Y + Z) = (X + Y) (X ÷ Z) b) memory interleaving
c) X + XZ = X c) reducing the size of memory
d) X + Y = Y + X d) none of the above
Q.12 If (12A)3 = (123)A’ then the value of
Q.5 Which of the following codes needs A is
7 bits to represent a character? a) 3 b) 3 or 4
a) ASCII b) BCD c) 2 d) none of the above
c) EBCDIC d) GRAY
Q.13 Choose the correct statement.
Q.6 Which of the following the are not a) By scanning a bit pattern, one can
weighted codes? say whether, it represents data
a) Roman number system or not.
b) Decimal number system b) Whether a given piece of
c) Excess 3-code information is a data or not
d) Binary number system depends on the particular
application
Q.7 The minimum time delay between c) Positive numbers can’t be
the initiations of two independent represented in 2’s compliments
memory operations is called form.
a) access time b) cycle time d) Positive numbers can’t be
c) transfer rate d) latency time represented in 1’s compliments
form.
100
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.14 Which of the following does not a) It produces product of sum as the
need extra hardware for DRAM output.
refreshing? b) It produces sum of products as
a) 8085 b) Motorola-6800 the output.
b) Z-80 d) None of the above c) It is dedicated for a particular
operation.
Q.15 The advantage of MOS devices over d) It is general.
bipolar devices is
a) it allows higher bit densities and Q.22 Any given truth table can be
also cost effective represented by a
b) it is easy to fabricate a) Karnaugh map
c) its higher-impedance b) sum of product of Boolean
d) its operational speed expressions
c) product of sum of Boolean
Q.16 The boolean expression X + X’ Y expressions
equals d) none of the above
a) X + Y b) X + XY
c) Y + YX d) X’ Y + Y’ X Q.23 A number system uses 20 as the
radix. The excess code that is
Q.17 ( X + Y) + Z = X ( Y + Z) necessary for its equivalent binary
a) shows that the Boolean operator coded representation is
OR is distributive a) 4 b) 5
b) shows that the Boolean operator c) 6 d) 7
OR is associative
c) implies the associativity of the Q.24 Choose the correct statement.
Boolean operator AND a) Bus in a group of information
d) None of the above carrying wires.
Q.18 Which of the following are registers? b) Bus is needed to achieve
a) Accumulator reasonable speed of operation.
b) Stack pointer c) Bus can carry data or address.
c) Program counter d) A bus can be shared by more
d) Buffer than one device.
101
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.28 The idea of cache memory is based Q.34 The minimum number of gates
on the required to implement and Boolean
a) property of locality of reference expression AB + AB’+ A’C
b) fact that only a small portion of a a) 1 AND gate and 1 OR gate
program is referenced relatively b) 2 NAND gates
frequently c) 3 AND gates and 2 OR gates
c) heuristic 90-10 rule d) none of the above
d) fact that references generally
tend to cluster Q.35 Property of locality of reference may
fail if a program has
Q.29 Which of the following weights a) many conditional jumps
makes the complement operation b) many unconditional jumps
easier in BCD form? c) many operands
a) 8-4-2-1 b) Excess-3 d) none of the above
c) 2-4-2-1 d) 3-2-1-0
Q.36 Which of the following comments
Q.30 The sequence of events that happen about half adder are true?
during a typical fetch operation is a) It adds 2 bits.
a) PC → Mar → Memory → MDR → IR b) It is called so because a full
b) PC → Memory → MDR →IR adder involves two-adders.
c) PC → Memory → IR c) It does half the work of a full-
d) PC → MAR→ Memory → IR adder.
d) It needs two input and generate
Q.31 Any given Boolean expression can two output.
be implemented by using
a) Only NAND gates Q.37 The binary equivalent of the decimal
b) Only NOR gates number 0.4375 is
c) Only OR gates a) 0.0111 b) 0.1011
d) Only AND gates c) 0.1100 d) 0.1010
Q.32 To get Boolean expression in the Q.38 The Boolean expression (A + C)(AB’
product of sum form, from a given + AC)(A’C’ + B’) can be simplified to
Karnaugh map a) AB b) AB + A’C
a) don’t care conditions should not c) A’B + BC d) AB +BC
be present
b) don’t care conditions, if present, Q.39 A byte addressable computer has a
should not taken as zeroes memory capacity of 2m Kbytes and
c) one should cover all the 0’s can perform 2n operations. An
present and complement the instruction involving 3 operands
and one operator need a maximum of
resulting expression
a) 3m bits b) 3m + n bits
d) one should cover all the 1’s
c) m +n bits d) none of the above
present and complement the
resulting expression Q.40 In the previous problem, if the
computer is word addressable with
Q.33 The Boolean expression AB + AB’ + the word size being 8 bytes then the
A’C + AC is unaffected by the value answer will be
of the Boolean variable a) 3m bits b) 3m + n bits
a) A b) B c) m +n bits d) none of the above
c) C d) none of the above
102
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.41 The number of columns in a sate a) Shift register
table for a sequential circuit with ‘m’ b) Mod-3 counter
flip-flops and ‘n’ input is c) Mod-2 counter
a) m + n b) m + 2n d) none of the above
c) 2m + n d) 2m + 2n
Q.48 Negative number cannot be
Q.42 A computer uses ternary system represented in
instanced of the traditional binary a) Signed magnitude form
system. An ‘n’ bit string in the binary b) b) 1’s complement form
system will occupy c) 2’s complement form
a) 3 + n ternary digits d) none of the above
b) 2n/3 ternary digits
c) n(log2 3) ternary digits Q.49 The addressing mode used in the
d) n(log3 2) ternary digits instruction of the form ADD X Y, is
a) absolute b) immediate
Q.43 The Boolean expression A’BE + c) indirect d) index
BCDE + BC’ D’E + A’B’DE’ + B’C’DE’
can be simplified to BE + B’DE’, If Q.50 The combinational circuit in fig.
the don’t care conditions are below can be replaced by a single
a) ABCDE + AB’CDE’
b) ABCDE + AB’CDE’ + ABCD’E
c) ABC’DE + AB’CDE + ABCD’E a) OR gate b) XOR gate
d) none of the above c) NOR gate d) AND gate
Q.45 Which of the following does not Q.52 The XOR operator is
have 8 data lines? a) commutative
a) 8085 b) 8086 b) associative
c) 8088 d) Z-80 c) distributive over AND operator
d) none of the above
Q.46 Which of the following logic families
is well suited for high speed Q.53 Bubble memorize are preferable to
operation? floppy disk because
a) TTL b) ECL a) of them higher transfer rate
c) MOS d) CMOS b) the cost needed to store a bit is
less
Q.47 The following arrangement of the JK c) they consume less power
flip-flops does the function of a d) of their reliability
103
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
c) Von Neumann
d) All of the above
computer? c) n 2 d) 22
a) SYPS b) MIPS
c) BAUD d) FLOPS Q.64 The value of x and y, if (x567)8 +
(2yx5)8 = (71yx)8 is
Q.57 If A B = C ( stands for the XOR a) 4, 3 b) 3, 3
operator), then c) 4, 4 d) 4, 5
a) A B = B
b) B C = C Q.65 The number of instruction needed
c) A B C = 0 to add ‘n’ numbers and store the
result in memory using only one
d) none of above
address instruction is
a) n b) 60
Q.58 Which of the following operations
c) 70 d) 75
(s) is/are not closed as regards to
computers? Q.66 The number of instruction needed
a) Addition b) Subtraction to add ‘n’ numbers and store the
c) Multiplication d) Division result in memory using only one
address instruction is
Q.59 IF (11A1B)8 = (12c9)16 (c stands for a) n
decimal 12), then the values of A b) n -1
and B are c) n + 1
a) 5, 1 b) 7, 5 d) independent of n
c) 5, 7 d) none of the above
Q.67 The Boolean expression
Q.60 The total number of possible corresponding to the circuit in
Boolean functions involving ‘n’ figure below is
Boolean variables is
a) infinitely many
b) nn
c) n2
d) none of the above
104
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.68 The clock of a microprocessor can In order to make it a tautology the ‘?’
be divided by 5 using a marked box should be replaced by
a) 3 bit counter b) 5 bit counter a) an OR gate b) an AND gate
c) mod 5 counter d)mod3counter c) a NAND gate d) a NOR gate
Q.69 The minimum cover for the Q.76 In the cache needs an access time of
maximum compatibility classes {ae, 20 ns and the main memory 120 ns,
acd, ad, bd} then the average access time of a
a) ae, acd , ad b) acd, ad , bd CPU is (assume hit-ratio is 80%)
c) ae, acd , bd d) ae, ad, bd a) 30 ns b) 40 ns
c) 35 ns d) 45 ns
Q.70 The values of a, x, y if 47 × 80 is the
10’s complement yaya0 are Q.77 The number of clock cycles
a) 4, 3, 2 b) 5, 4, 4 necessary to complete 1 fetch cycle
c) 3, 4, 5 d) 2, 4, 5 in 8085 (excluding wais state) is
a) 3 or 4 b) 4 or 5
Q.71 The reasons for the presence of ALE c) 4 or 6 d) 3 or 5
pin in 8085, but not in 6800 is that
a) 8085 uses I/O mapped I/O Q.78 The seek time of a disk is 30 ms. It
b) 876 ms rotates at the rate of 30 rotations
c) 850 ms per second. Each track has a
d) 900 ms capacity of 300 words. The access
time is approximately
Q.72 If memory access takes 20 ns with a) 47 ms b) 50 ms
cache and 110 ns without it , then c) 60 ms d) 62 ms
the hit-ratio, (cache uses a 10 ns
memory) is Q.79 Motorola’s 68040 is comparable to
a) 93 % b) 90 % a) 8085 b) 80286
c) 87 % d) 88 % c) 80386 d) 80486
Q.74 Any instruction should have at least Q.81 Which of the following interrupt is
a) 2 operands both level and edge sensitive?
b) 1 operands a) RST 5.5 b) INTR
c) 3 operands c) RST 7.5 d) TRAP
d) none of the above
Q.82 The difference between 80486 and
Q.75 Consider the circuit in Fig. below. 80386 is/are
a) presence of floating point co-
processor
b) speed of operation
c) presence of 8 k cache on chip
d) presence of memory controller
105
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.83 The addressing mode used in the d) all of the above
instruction PUSH B is
a) direct b) register Q.89 The number of possible Boolean
c) register indirect d) immediate functions that can be defined for n
Boolean variables over n-valued
Q.84 The most relevant addressing mode Boolean algebra is
to write position independent code a) 22
n
b) 2n
2
is n n
c) n 2 d) n n
a) direct mode
b) indirect mode Q.90 The ASCII code 56, represents the
c) relative mode character
d) indexed mode a) V b) 8
c) a d) carriage return
Q.85 Which of the following are CISC
machines? Q.91 Parallel printer uses
a) IBM 360 b) 80386 a) RS-232C interface
c) 68030 d) none of the above b) centronics interface
c) hand-shake mode
Q.86 Which of the following rules d) synchronous data transfer mode
regarding the addition of 2 given
number is correct? If negative Q.92 A micro programmed control unit
numbers are represented in 2’s a) is faster than a hard-wired
complement form? control unit
a) Add sign bit and discard carry, if b) facilitates easy implementation
any of new instruction
b) Add sign bit and add carry, if any c) is useful when very small
c) Don’t add sign bit and discard programs are to be run
carry, if any d) usually refers to the control of a
d) Don’t add sign bit and add carry, microprocessor
if any
Q.93 Which of the following are typical
Q.87 When INTR is encountered, the characteristics of a RISC machine?
processor branches to the memory a) Instruction taking multiple cycles
location, which is b) Highly pipelined
a) 0024H c) Instruction interpreted by micro
b) determined by the ‘call address’ programs
instruction issued by the I/O d) multiple register sets
device
c) determined by the ‘RST n’ Q.94 The working of a staircase switch is
instruction issued by the I/O typical example of the logical
devic operation
d) all of the above a) OR b) NOR
c) Exclusive-OR d) Exclusive-NOR
Q.88 The advantage of a single bus over a
multi-bus is the Q.95 The exponent of a floating-point
a) Low cost number is represented in excess-N
b) flexibility in attaching peripheral code so that
devices a) the dynamic range is large
c) high operating speed b) the precision is high
106
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
c) the smallest number is a) A = B = C = 1 b) B=C=1;A = 0
represented by all zeros c) A = C = 1; B = 0 d) A= B=1;C= 0
d) overflow is avoided
Q.98 In serial communication, an extra
Q.96 On receiving in interrupt from an clock is needed
I/O device, the CPU a) to synchronize the devices
a) halts for a predetermined time b) for program baud rate control
b) hands over control of address c) to make efficient use of RS-232
bus and data bus to the d) none of the above
interrupting device Q.99 In negative numbers are stored in
c) branches off to the interrupt 2’s complement form, the range of
service routine immediately numbers that can be stored in 8 bits
d) branches off to the interrupt is
service routine completion of the a) - 128 to + 128 b)-128 to+127
current instruction. c) - 127 to + 128 d)-127to + 127
Q.97 The Karnaugh map for the Boolean
function F of 4 Boolean variables is Q.100 If SUB A,B means B ~ A, Then SUB
given in Fig below. A, B, C are don’t 4(R0), *5(R1) means ( (X) means
care conditions. What values of A, B, contents of register of memory
C will result in the minimum location X)
expression? a) (((R1)+5)) – (4 * (R0) ))
b) (( (R1)+5)) – ( (R0)+4))
c) (( R1)+5) – (4 * (R0) )
d) (( R1)+4) – (R0 +4 ))
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(c) (b) (a) (c) (a) (a) (b) (b) (a) (b) (a) (d) (b) (c)
15 16 17 18 19 20 21 22 23 24 25 26 27 28
(a) (a) (b) (a) (a) (c ) (b) (a) (c) (a) (a) (b) (c) (a)
29 30 31 32 33 34 35 36 37 38 39 40 41 42
(c) (a) (a) (c) (b) (d) (a) (a) (a) (a) (d) (d) (c) (d)
43 44 45 46 47 48 49 50 51 52 53 54 55 56
(c ) (b) (b) (b) (b) (d) (a) (d) (a) (a) (c) (d) (c) (b)
57 58 59 60 61 62 63 64 65 66 67 68 69 70
(a) (a) (d) (d) (c) (a) (a) (a) (d) (c ) (a) (c) (c) (d)
71 72 73 74 75 76 77 78 79 80 81 82 83 84
(c) (b) (b) (d) (c) (b) (c) (a) (d) (b) (d) (a) (c) (c)
85 86 87 88 89 90 91 92 93 94 95 96 97 98
(a) (a) (b,c) (a) (d) (b) (b) (b) (b) (c ) (c ) (d) (d) (b)
99 100
(b) (b)
107
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q. 2 (b) to it. A word pointed to by the
Exclusive OR takes the value 0 if programme counter, is an
there Are even numbers of 1’s instruction. Otherwise it need not
be. Also, the word data has context
Q. 3 (a) sensitive meaning. One can write a
X can take the value of either 1 or 0. programme in Pascal that needs
Substitute and verify the identities. radius as the input data. The
programme, as a whole, is input
Q. 4 (c) data for the compiler during the
Form truth table and check the compilation process.
correctness of the option (c) and (d)
Q. 16 (a)
Q.8 (b) X + X’Y = X . 1 + X’Y
Substitute and verify each of the = X (1 + Y)+ X’Y
possibilities. = X . 1 + XY + X’Y
= X + (X + X’) Y
Q. 9 (a) =X+1.Y=X+Y
During execution of the current If that sounds quite
instruction the content is unnatural, Here is another way. Let
incremented so that it points to the K = X + X’Y (we have to find K)
next instruction. Complementing both sides K’
= (X + X’Y)’ = X’ (X + Y’)
Q. 10 (b) = X’ X + X’Y’ = 1 + X’Y’
Converting to decimal form, the = X’Y’
given equation is Again complementing both sides K =
3 + ( 2 × 5) + (1 × 5 × 5) (X’Y’)’ = X + Y.
= 3 + A × B i.e., 38 = A × B + 3. Hence the answer is (a).
So, A × B = 35.
Q. 17 (b)
Possible values for A, B are 1, 35; 5,
Obviously it shows it is associative.
7; 7, 5; 35, 17, 5 and 35 are
It implies (by the law of duality), the
infeasible, as permissible digits for a
associativity of AND also,
number in base ‘r’ are 0, 1, 2, ....(r -
complementing both sides,(X + (Y +
1). Hence 1 and 5 are the possible
Z))’ = ((X + Y) + Z)’
values of A.
X’(Y’Z’) = (X’Y’)Z’(By De Morgan’s
law)
Q. 12 (d)
Refer Qn. 10. Converting to decimal
Q. 22 (a)
form, A + 2 × 3 + 1 × 3 × 3 = 3 + 2 × A
Karnaugh map is just pictorial
+ 1 × A × A. Solving for A, we get A =
representation of the truth table. By
- 4 or 3. Both are infeasible.
covering the 1’s, We get the sum of
Q. 13 (b) product form. By covering the 0’s
The contents of a word may and then complimenting, We get the
represent an instruction or data. product of sum form
Just by looking at the contents, it is
not possible to attach any meaning Q. 23 (c)
108
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Consider the decimal digit 5. Its BCD 2 -1. This way 1010, will be decimal
representation is 0101. If 4.
complimented, we 1010, i.e., 15 – 5.
Ingeneral, complimenting x gives 15 Q. 31 (a)
– x.But correct complimented value NOR and NAND are universal by
should be 9 – x. The difference of 6 NOR as follows.
can be nullified by going for excess – NAND can be simulated by NOR as
3 code. (3 because using it twice, i.e., follows.
during the conversion and NAND (A, B) = A’ + B’
reconversion process one can NOR (A, A) = A’
account for the excess 6). If a NOR (B, B) = B’
number system uses 20 as the radix, NOR (A’, B’) = (A’, + B’)
Each digits needs 5 bits in the = AB
equivalent BCD form. So, compliment NOR (AB, AB ) = (AB)’
of x, gives 31 – x. But the current = A’ + B’ = NAND (A, B)
value in 19 – x. To account for the So, it suffuse to prove NAND is a
xcess 31 – 19, i.e., 12 we have to use universal gete.
excess – 12 code. i.e., Take 11 to stop If that is true, it should be simulate
its compliment should be 90 -11 = 8. any Boolean operator. Since
In excess – 6code, we add 6 to 11, to operation are OR, refer Qn. 25 to see
get 17. Complementing, we get 31- how OR can be simulated.
17 = 14. If we subtract the excess It is simple to simulate
6,we get 14 – 6 = 8, which is the complementation.
required answer. NAND (A, A) = A
AND can be simulated as follows.
Q. 24 (a) NAND (A, B) = (AB)’
NAND ((AB)’, (AB)’) = AB
Q. 25 (a) Hence the correct answers are (a)
By NAND gate as follows. and (b).
Q.32 (c)
Don’t care condition need or need
not be present. If present, they need
By NOR gate as follows.
or need not be used. If they aid in
the simplification process, we them
to our advantage. Otherwise they
are literally don’t care.
Q. 28 (a)
90 – 10 is a heuristic rule that says
Q.33 (b)
90 % of the execution time is spent
AB + AB’ + AC’ + AC
on 10 % of the code.
= A (B + B’) + (A’ + A)C
= A (1) + (1) C = A + C, which is
Q. 29 (c)
independent of B.
Consider the decimal digit 5. Its BCD
form is 0101. Complementing, we
Q.34 (c)
get 1010, which is decimal 10. To
The given expression is AB + AB’ +
make 1010 correspond to decimal 4
A’C = A (B + B’) + A’C = A(1) + A’C
(which is the correct complement of
= A + A’C = A + C ( Refer Qn. 16)
5), we can assign the weights 2 - 4 -
109
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
So, one needs just a single OR gate to karnaugh map as above. The 1’s
implement the given Boolean can be covered in the optimal way,
expression. if the slots marked X are set to 1’s.
So the three X’s in the positions
Q.38 (a) ABCD’E, ABC’DE, AB’CDE’ are the
(A + C) (AB’ + AC) don’t care conditions to be set to 1
= AAB’ + AAC + ACB’ + CAC and used. Hence the answer is (c).
= AB’ + AC + CAB’ + AC
(Since X = X) Q.50 (d)
= AB’ + CAB’ + AC Te circuit is (A’B’)’ = A . B.
(Since X + X =X)
So the given Boolean expression is Q.51 (a)
(AB’ + CAB’ + AC)(A’C’ + B’) TO convert to best 8, we group in
=AB’AC’+AB’B’ + CAB’A’C’ +CAB’B’ + 3’s, because 23 = 8.
ACA’C’ + ACB’ TO convert to best 16, we group in
= 0 + AB’ + 0 + CAB ‘+ 0 +ACB’ 4’s, because 24 = 16.
= AB’ + ACB’ = AB’ (1 + C) TO convert to best 32, we group in
= AB’ 5’s, because 25 = 32.
Grouping in 5’s, from the right, we
Q.39 (d) can get the answer.
To specify a particular operation,
out of the 2n possible operations, Q.52 (a)
one needs n bits. As the machine is It is commutative because A B =
byte addressable, to specify a B A It is associative because
particular byte addressable, to (A B) C = A (B C).
specify a particular byte we need (m It is not distributive over AND
+ 10) + n = 3 m + n + 30 bits. because
A ( B AND C) = (A B ) AND
Q.40 (d) (A C)Is not true. For e.g.,
Refer Qn. 39. 1 (0 AND 1) = 1
If it is word addressable, then the But (1 0) AND (1 1) = 1
number of word is 2(m + 10) AND 0 = 0
divided by 23,i.e., 2m + 7 words. Q.57 (a)
So, one needs 3 (m + 7) + n X + Y = 0 (Construct the truth table
= 3 m + n + 21 bits. and verify)
So, A B = C
Q.41 (c)
A (A + B) = A C
It is 2 m + n. ‘n’ columns for the ‘n’
(A A) B = A C
inputs; 2m columns for storing the
0 B=A C
‘m’ present states and ‘m’ next
B=A C
states.
Similarly, (b) and (c) can be proved.
Q.59 (d)
Converting to base 2, the equation
reads 001 001 A 001 B = 0001 0010
1100 1001 Here A, B stand for a
The terms A’BE corresponds to A‘ –
group of binary digits. So, grouping
0; B – 1; E – 1; C – 0 or 1; D – 0 or 1.
the right hand side in 3’s, from the
Similarly make all 1’s and get the
right and matching corresponding
110
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
groups in both the sides, we get B = 2) number of bits. So binary
001 and A = 011 So, A=3 and B=1. representation of M needs more
than log 1024 bits. But less than log
Q.60 (d) 1025 bits. Log 1024 is greater than log
A single Boolean variable can take 824 (= log 272 = 72 log 2 = 72 ). So,
the value either 0 or 1, i.e., 2 more than 72 bits are needed. The
possible ways. So, ‘n’ Boolean nearest answer is 75.
variables can take 2 × 2 × 2... (n
times) values , i.e., 2n times. So, the Q.66 (c)
truth table will have 2n rows. Each A typical one address instruction
row can be assigned one of the 2 uses that address to specify one
n
values 0 or 1. So, totally 22 operand, the other operand will be
functions are possible. So, none of in the accumulator by default. So, to
the given add n given numbers, a1, a2,...an,
first transfer a1 to accumulator.
Q.62 (a)
A 2-input multiplexer can select a Next the instruction ADD a2 – adds
single line out of the two-input lines. the content of a2 to the accumulator
and leaves the sum there.
To select a single line out of the 210,
Continuing this way, we need n
i.e., 1024 input lines, we have to use
instruction to add n numbers and
1023 two-input multiplexers. In
order to select 512 lines out of the place the result in the accumulator.
1024, we need 512 two-input Finally, to store the
result in memory 1 more instruction
multiplexers. Continuing this way, to
is needed. So, (n + 1) instructions
ultimately get a single line, we need
a total of 512 + 256 + 128 + 64 + 32 are needed.
+ 16 + 8 + 4 + 2 + 1 = 1023 two –
Q.67 (a)
input multiplexers.
The input to the NAND gate is (A +
Q.63 (a) B’)’ and (A’ + B)’, i.e., A’B and AB’.
By definition a binary operator So, the output of NAND gate will be
defined on a set A is a function F : A (A’BAB’) = 0’ = 1. So F is always 1.
× A → A. The domain, i.e., A X A has Hence it is a tautology.
n × n elements (because A has n
elements). Each of these n2 elements Q.69 (c).
can be mapped to one of the ‘n’ Let us denote the given classes in a
2
element of A. So, totally n n binary tabular form as follows:
operation are possible.
Q.64 (a)
Add 7 and 5. It yields 4 and carries 1
(since it is an octal addition). So, x is
4. Similar reasoning, after
substituting x = 4, yields y = 3.
The ‘e’ column has only one 1. That
Q.65 (d) corresponds to ae. So, ae has to be
The number M will be such that 1024 present in the minimal cover.
M < 1025 Analyzing the ‘c’ and ‘b’ column. We
A decimal number y, needs find acd and bd have to be included.
approximately log x (log is to base Hence the answer.
111
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.70 (d) Q.80 (b)
From the definition of 10‘s Totally 256 functions are possible
complement, (refer Qn. 60). We have to find how
105 = 47 × 80 + yaya0 many of these 256 functions satisfy
So, a is 2 ; y is 5; x is 4. the conditions. f(X, Y, Z) = f(X’, Y’ , Z’)
e.g., If f(0, 1, 1) = 0 then f(1, 0, 0) has
Q.72 (b) to be 0.
Let m be the hit-ratio. Then 20 = 10 This constraint makes only half of
× m + (1 - m) × 110. the truth table, i.e., 4 rows to take
Solving we get m = 0.9, i.e., 90 % independent values. So we have 24 =
16 possible functions.
Q.74 (d)
Operations on stack need no
operands address as only the top of Q.89 (d)
stack can be accessed. Top of stack There are ‘n’ Boolean variables.
will be stored in a register, Each can take one of the n possible
dedicated for this purpose. Only the values 0, 1, 2,....., n-1. So, the truth
operation (e.g. POP) needs to be table will have nn rows. Now each
specified. row can take one of the ’n’ values as
the output values. So, The possible
Q.75 (c, d) no of functions are n × n × n ...(nn
To make F, a tautology, its value has times), i.e., n n
n
112
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission