Computer Organization and Architecture

COMPUTER ORGANIZATION
AND ARCHITECTURE
For
COMPUTER SCIENCE
.
AND ARCHITECTURE
SYLLABUS
Machine instructions and addressing modes, ALU and data-path, CPU control design,
Memory interface, I/O interface (Interrupt and DMA mode), Instruction pipelining,
Cache and main memory, Secondary storage.
ANALYSIS OF GATE PAPERS

Exam Year 1 Mark Ques. 2 Mark Ques. Total
2003 2 3 8
2004 1 7 15
2005 4 8 20
2006 1 7 15
2007 2 6 14
2008 - 12 24
2009 2 4 10
2010 1 4 9
2011 1 4 9
2012 2 2 6
2013 1 4 9
2014 Set-1 1 3 7
2014 Set-2 1 3 7
2014 Set-3 1 2 5
2015 Set-1 1 1 3
2015 Set-2 1 2 5
2015 Set-3 1 2 5
2016 Set-1 1 2 5
2016 Set-2 1 5 11
2017 Set-1 4 5 14
2017 Set-2 4 3 10
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
CONTENTS
Topics Page No
1. OVERVIEW OF COMPUTER SYSTEM
1.1 Introduction 01
1.2 Functional Units 01
1.3 Numbers and Arithmetic Operations 02
1.4 Decimal Fixed-Point Representation 04
1.5 Floating Point Representation 04
1.6 Signed-Operand Multiplication 05
1.7 Booth’s Algorithm 05
1.8 Integer Division 06
1.9 Non-Restoring-division Algorithm 07
1.10 Flouting-Point Numbers and Operations 07
2. INTRODUCTIONS
2.1 Introduction cycle 13

2.2 Addressing Modes 14
2.3 Instruction Formats 14
2.4 Instruction Interpretation 17
2.5 Microgram med Control 19
2.6 Wilkes Design 19
2.7 Horizontal and Vertical Microinstructions 20
3. MEMORY ORGANIZATION
3.1 Introduction 22
3.2 Memory Hierarchy 22
3.3 Memory Characteristics 25
3.4 Semiconductor Ram Memories 26
3.5 Virtual Memory Technology 35
3.6 Advantages of using Virtual Memory 36
3.7 Paging, Segmentation and Paged Segments 37
3.8 Secondary Memory Technology 40
4. INPUT AND OUTPUT UNIT
4.1 I/O Mapping / Addressing Methods 44

4.2 IOP (IO Processor) 45
4.3 Direct memory Access 46
4.4 Steps involved in the DMA operation 48
4.5 Interrupt-Initiated I/O 50
4.6 Data Transfer Techniques 50
4.7 Responsibilities of I/O Interface 52
4.8 IBM 370 I/O Channel 53
4.9 Polling 55
4.10 Independent Requesting 55
4.11 Local Communication 56
5. MULTIPLE PROCESSOR ORGANISATION
5.1 Flynn’s Classification of Computer Organization 60

5.2 Multiprocessor 61
5.3 Parallel Processing Applications 61
5.4 Multiprocessor Architecture 62
5.5 Loosely Coupled Multiprocessor 62
5.6 Serial Communication 63
5.7 Asynchronous Transmission 63
5.8 Synchronous Transmission 64
5.9 Solved Examples 64
6. GATE QUESTIONS 66
7. ASSIGNMENT QUESTIONS 100
1 OVERVIEW OF COMPUTER SYSTEM
1.1 INTRODUCTION program and data. There are two

classes of storage, called primary and
Digital computer or simply computer is secondary.
fast electronic calculating machine that  Primary storage is a fast memory that
accepts digitized input information operates at electronic speeds. Programs
processes it according to a list of internally must be stored in the memory while
stored instructions, and produces the they are being executed.
resulting output information. The list  The memory contains a large number of
instruction is called a computer program, semiconductor storage cells, each
and the internal storage is called computer capable of storing one bit of information.
memory. Many types of computers exist  Programs must reside in the memory
that differ in many factors lime size, cost, during execution. Instructions and data
computational power and intended use. can be written into the memory or read
out under the control of the processor.
1.2 FUNCTIONAL UNITS Memory in which any location can be
reached in a short and fixed amount of
A computer consists of five functionally time after specifying its address is
independent main parts: called random-access-memory (RAM).
 The time required to access one word is
called the memory access time. This
time is fixed, independent of the
location of the word being accessed.
 The small, fast RAM units are called
caches, they are tightly coupled with the
processor and are often contained on
the same integrated circuit to achieve
high performance.
 The main memory is largest and
1.2.1 Input unit slowest unit. Although primary storage
 Accepts coded information from human is essential, it tends to be expensive.
operators, from electromechanical Thus, additional cheaper, secondary
devices such as keyboards, or from other storage is used when large amounts of
computers over digital communication data any many programs which are
lines. infrequently used have to be stored.
 Many other kinds of input devices are
available, including Joysticks, 1.2.3 Arithmetic and Logic Unit
Trackballs, and mousses. These are
often used as graphic input devices in  Most of the operations are executed in
conjunction with displays. the arithmetic and logic unit (ALU)of
the processor.
1.2.2 Memory Unit For example: for addition of two
numbers, they are brought into the
 The function of the memory unit is to processor, and the actual addition is
store carried out of the ALU.
 Any other arithmetic or logic operation,
like multiplication, division is initiated 1.3.1 Number Representation
by bringing the required operands into
the processor, where the operation is Consider an n-bit vector
performed by the ALU. C = Cn-1…………..C1C0
 The control and the arithmetic and logic Where Ci = 0 or 1 for 0  i  n-1. This
units are many times faster than the vector can represent unsigned integer
other devices connected to a computer values V in the range 0 to 2n-1, where
system. This enables a single processor V(C) = Cn-1  2n-1+…… + c1  21 + c 0  20
to control a number of external devices Three systems are used for representing
such as keyboards, displays, magnetic the positive and negative numbers:
and optical disks.
1. Sign and Magnitude
1.2.4 Output Unit  The leftmost bit is 0 for positive
The output unit is the counterpart of the numbers and 1 for negative numbers.
input unit. Its function is to send processed  In this, negative values are represented
results to the outside world. For example: by changing the most significant bit
printers. from 0 to 1 in the vector C of the
corresponding positive value.
1.2.5 Control Unit  For example:
+ 5  0101
 The memory, arithmetic and logic, and - 5  1101
input and output units store and
process information and perform input 2. 1’s Complement
and output operations. The control unit  The leftmost bit is 0 for positive
is effectively the centre that sends numbers and 1 for negative numbers
control signals to other units and senses  Negative values are obtained by
their states. complementing each bit of the
corresponding positive number.
The operation of a computer:
For example:
 The computer accepts information in For -3 we can find by complementing each
the form of programs and data through bit in the vector 0011 to yield 1100.
an input unit and stores it in the  Same operation is used for converting a
memory. negative number to the corresponding
 Information stored in the memory is positive value. The operation of forming
fetched, under program control, into an the 1’s complement of a given number
arithmetic and logic unit, where it is is equivalent to subtracting that
processed. number from 2n-1.
 Processed information is output
through a output unit and all activities 3. 2’s Complement
inside the machine is directed by the  The leftmost bit is 0 for positive
control unit. numbers and 1 for negative numbers.
 In this, forming the 2’s complement of a
1.3 NUMBERS & ARITHMETIC OPERATIONS number is done by subtracting that
Computers are built using logic circuits that number from 2n.
operate on information represented by two Hence, the 2’s complement of a number
values as 0 and 1 and we define the amount is obtained by adding 1 to the 1’s
of information as a bit information. complement of that number.
1.3.2 Arithmetic Addition  A carryout of the sign bit position is
discarded.
1. In Signed-Magnitude form  Changing a positive number to a
 Follows the rules of ordinary arithmetic negative number is easily done by
If the signs are same  add two taking its 2’s complement and vice-
magnitudes and give the sum common versa is also true.
sign. For example : (-6) – (-13) = +7
If the signs are different  subtract In binary format, it is written as
smaller magnitude from the larger and 11111010 – 11110011
give the result, the sign of the larger The subtraction is changed to addition by
magnitude. taking the 2’s complement of the
For Example: subtrahend (-13) to give (+13).
(+35) + (-37) = -(37-25) = -2 In binary format this is
11111010 + 00001101 = 100000111
2. In 2’s complement form and removing the end carry, we obtain the
 The system does not require a answer as 00000111  (+7).
comparison or subtraction only
addition and complementation is 1.3.4 Overflow in Integer Arithmetic
necessary.
 The procedure is as follows: Add the  In the 2’s complement number
two numbers including their sign bits representation system, n-bits can
and discard any carry out of the sign represent values in the range -2n-1 to +2n-
(left most) bit position. 1-1.
When the result of an arithmetic

Note: The negative number must initially operation is outside the represent able
be in 2’s complement and that if the sum range, an arithmetic overflow has
obtained after the addition is negative, it is occurred.
in 2’s complement form.  While adding unsigned numbers, the
For Example: carry out from the most significant bit
6 00000110 position serves as the overflow
13 00001101 indicator. This is not applicable for
adding signed numbers.
19 00010011
 6 11111010 For example:
13 00001101 By using 4-bit signed numbers, if we try to
add the numbers +7 and +4, the output is
17 00000111 1011  -5  Incorrect result (with carry-
out =0)
1.3.3 Arithmetic Subtraction
Note: Overflow may occur if both
 Subtraction of two signed numbers, summands have the same sign. The
when negative numbers are in 2’s addition of numbers with different signs
complement form, subtraction is very cannot cause overflow.
simple and can be done as follows:  A single method to detect overflow is to
 Take the 2’s complement of the examine the signs of two summands X
subtrahend (including the sign bit) and Y and the sign of the result. When
 Add it to the minuend (including the both operands X and Y have the same
sign bit) sign, an overflow occurs when the sign
of S is not the same as the signs of X & Y.
1.4 DECIMALFIXED-POINT REPRESENT-
ATION
 The representation of decimal numbers
in registers is a function of the binary
code used to represent a decimal digit.
A 4-bit decimal code requires four flip
flops for each decimal digit.
Disadvantages Example :
 By representing numbers in decimal we + 6132.789
are wasting amount of storage space Fraction: +0.6132789
since the number of bits needed to store Exponents: +0.4
a decimal number in a binary code is  Floating point is always interpreted to
greater than the number of bits needed represent a number in the following
of its equivalent binary representation. form m  re m and e are physically
 The circuits required to perform represented in the register (including
decimal arithmetic are more complex. the signs). The radix r and the radix-
point position of the mantissa are
Advantage always assumed.
 In applications like business data  A floating point binary number is
processing we require small amounts of represented in a similar manner except
arithmetic computations (in decimal that it uses base-2 for exponent.
format).
For example: The binary number +
 The representation of signed numbers
1001.11 is represented with 8 bit fraction
in binary is similar to the
and 6 bit exponent as follows.
representation of signed decimal
Fraction Exponent
numbers in BCD. The sign of a decimal
01001110 000100
number is usually represented with
four bits to confirm with the 4-bit code  A floating point number is said to be
of the decimal digits. normalized if the most significant digit
 The signed-magnitude system is of the mantissa is nonzero.
difficult to use with computers. The For example:
signed complement system can be The decimal number 250 is normalized
either the 9’s or the 10’s complement is but 00035 is not.
the one most commonly used. To obtain Regardless of where the position of the
the 10’s complement of a BCD number, radix point is assumed to be in the
we first take the 9’s complement and mantissa, the number is normalized
then add one to the least significant only if its leftmost digit is nonzero.
digit. The 9’s complement is calculated The number can be normalized by
from the subtraction of each digit from 9. shifting three positions to the left and
 The subtraction of decimal numbers is discarding the leading 0’s to obtain
either unsigned or in the signed-10’s 11010000. Normalized numbers
complement system. Take the 10’s provide the maximum possible
complement of the subtrahend and add precision for the floating point number.
it to the minuend. A zero cannot be normalized in floating
point by all 0’s in the mantissa and
1.5 FLOATING POINT REPRESENTATION
exponent.
1.6 SIGNED-OPERAND MULTIPLICATION multiplicand, as in the standard procedure.
However, we can reduce the number of
The multiplication of signed operands required operations by regarding this
generates a double-length product in the multiplier as the difference between the
2’s complement number system. In general, two numbers:
accumulate partial products by adding 0100000 (32)
versions of the multiplicand as selected by - 0000010 (2)
the multiplier bits. _________________
Case (i): 0011110 (30)
Positive multiplier and negative This suggests that the product can be
multiplicand. generated by adding 25 times the
 When we add a negative multiplicand to a multiplicand to the 2’s complement of 21
partial product, we must extend the sign- times the multiplicand. For convenience,
bit value of the multiplicand to the left as we can describe the sequence or required
far as the product will extend. operations by recoding the preceding
For example: multiplier as 0+1000-10.
The 5 bit signed operand, -13 is the
multiplicand and it is multiplied by +11, to
get the product as -143.
Note : For a negative multiplier, a solution

is to form the 2’s complement of both the
multiplier and multiplicand and proceed as
in the case of a positive multiplier. A
algorithm called the Booth’s algorithm
works equally well for both negative and
positive multipliers.
1.7 BOOTH’S ALGORITHM

The Booth’s algorithm generates a 2n-bit
product and treats both positive and
negative numbers uniformly. A powerful
algorithm for signed number
multiplication, the Booth’s algorithm
generates a 2n-bit product and treats both
positive and negative numbers uniformly.
Consider a multiplication operation in  In general, in Booth’s scheme, -1 time
which the multiplier is positive and has a the shifted multiplicand is selected
single block of 1s, for example, 0011110. when moving from 0 to 1, and +1 time
To derive the product, we could add four the shifted multiplicand is selected
appropriately shifted versions of the when moving from 1 to 0, as the
multiplier is scanned from right to left.  The top number is the 2’s complement
Figure 9 illustrates the normal and the representation of -2k+1. The recoded
Booth’s algorithms for the example just multiplier now consists of the part
discussed. The Booth’s algorithm corresponding to the second number,
clearly extends to any number of blocks with -1 added in position k+1. For
of 1s in a multiplier, including the example, the multiplier 110110
situation in which a single 1 is consider becomes 0-1 +10-10.
a block see figure 10 for another  The Booth’s technique for recoding
example of recoding the multiplier. In multipliers is summarized in above
this example, the least significant bit is table. The transformation 011…110 
1. This situation is uniformly handled +100…..0 -10 is called skipping over 1s.
by assuming that an implied 0 lies to its This term is derived from the case in
right. which the multiplier has its 1s grouped
 The Booth’s algorithm can also be used into a few contiguous block; only a few
for negative multiplier, as figure shows. versions of the multiplicand, that is, the
To see the correctness of this technique summands, must be added to generate
in general, we use a property of the product, thus speeding up the
negative number representations in the multiplication operation. However, in
2’s complement system. Let the leftmost the worst case that of alternating 1s and
zero of a negative number, X, be at a bit 0s in the multiplier-each bit of the
position k, that is multiplier selects a summand. In fact,
X= 11…..10xk-1….x0 this results in more summands than if
The value of X is given by the Booth algorithm were not used. A
V(X) = -2k+1 + xk-1*2k-1 +x0 *20 16-bit, worst-case multiplier, an
This is supported by observing that ordinary multiplier, and a good
11…..100 ….0 multiplier are shown in figure 12.
+ 00……00xk-1 ….x0  The Booth’s algorithm has three
________________________ attractive features
X =11…..10xk-1 …..x0 1. It handles both positive and
negative multipliers uniformly.
Table: Booth multiplier recording table 2 Second, it achieves some efficiency
in the number of additions required
Multiplier Version of when the multiplier has a few large
Bit Bit -1 multiplicand blocks of 1s.
I by bit I 3. The speed gained by skipping over
0 0 0×M 1s depends on the data. On average,
0 1 +1 × M the speed of doing multiplication
1 0 -1 × M with the Booth’s algorithm is the
1 1 0×M same as with the normal algorithm.
1.8 INTEGER DIVISION
Decimal division and the binary-coded

division of the same value
A circuit that implements division by
longhand method operates as follows:
 It positions the divisor appropriately
with respect to the dividend and
performs a subtraction.
 If the remainder is zero or positive, a  The q0 bit is appropriately set to 0 or 1
quotient bit of 1 is determined, the after the correct operation has been
remainder is extended by another bit of performed.
the dividend, the divisor is
repositioned, and another subtraction is 1.9 NON-RESTORING-DIVISION ALGORITHM
performed.
 If the remainder is negative, a quotient Step1 :
bit of 0 is determined, the dividend is Do the following n times.
restored by adding back the divisor, is  If the sign of A is 0, shift A and Q left one
repositioned for another subtraction. bit position and subtract M from A.
otherwise, shift A and Q left and add M to
A
 If the sign of A is 0, set q0 to 1
otherwise, set q0 to 0.
Step 2 :
If the sign of A is 1,add M to A, step 2 is
needed to leave the proper positive
remainder is A at the end of n cycles.
Do the following n times:

 Shift A and Q left one binary position
 Subtract M from A, and place the
answer the answer back in A.
 If the sign of A is 1, set q0 to 0 & add M
back to A (i.e. restore A) otherwise, set q0 to
1. Note : The restore operations are no
 This algorithm can be improved by longer needed and that exactly one add or
avoiding the need for restoring A after subtract operation is performed per cycle.
an unsuccessful subtraction.
 Consider the sequence of operations 1.10 FLOATING-POINT NUMBERS AND
that takes place after the subtraction OPERATIONS
operation in the preceding algorithm.
 If A is positive  Shift left and subtract  Up to this, we have deal only with fixed-
M i.e. we performs 2A-M point numbers and have considered
 If A is negative  We restore it by them as integers, that is, as having an
implied binary point at the right end of
performing A+M and then we shift it left
the number. By assuming the binary
and subtract M.
fraction point is just to the right of the
 This is equivalent to performing 2A+ M
sign bit, thus representing a fraction. In
7
the 2’s complement system, the signed (ii) The exponent range (  99) are
value F, represented by the n-bit binary sufficient for a wide range of
fraction. calculations. It is possible to
C = C0C-1C-2……..b-(n-1) is given by approximate this mantissa precision
F(c) = -C0  20 + C-1  2-1+C-2  2-2+…..+C- and scale factor range in a binary
(n-1)  2-(n-1) representation that occupies 32 bits.
Where the range of F is, A 24-bit mantissa can approximately
-1  F  1-2-(n-1) represent a 7-digit decimal number,
 Consider the range of values represent and an 8-bit exponent to an implied
able in a 32-bit, signed, fixed-point base of 2 provides a scale factor
format. Interpreted as integers, the with a reasonable range. One bit is
value range is approximately 0 to  2.15 needed for the sign of the number.
 10-9. If we consider them to be Because the leading nonzero bit of a
fraction, the range is approximately  normalized binary mantissa must be
4.55  10-10to  1. a 1, it does not have to be included
 Hence, we need to accommodate both explicitly in the representation.
very large integers and very small Thus, total of 32-bits is needed.
fractions. To do this, a computer must  The standard explained above for
be able to represent numbers and representing floating-point numbers
operate on then in such a way that the in 32-bits has been developed and
position of the binary point is variable specified in detail by the Institute of
and is automatically adjusted as Electrical and Electronics Engineers
computation proceeds. (IEEE). This standard describes both
Such a representation is called as the representation and the way in
floating point representation. which the four basic arithmetic
 Due to the position of binary or floating operations are to be preformed.
point in a number is variable and it  The 32-bit representation is given in
must be given strictly in the floating figure below
point representation.
 By convention, when the decimal point
is placed to the right of the first
(nonzero) significant digit, the number
is said to be normalized.
Thus, floating point number
representation is number
representation in which a number is
represented by its sign, a string of  The sign of the number is given in the
significant digits, known as mantissa first bit
and an exponent to an implied base for  Followed by a representation for the
the scale factor. exponent (to the base 2) of the scale
factor.
1.10.1 IEEE STANDARD FOR FLOATING-
 Instead of the signed exponent, E, the
POINT NUMBERS
value actually stored in the exponent
field is an unsigned integer E’ = E +
 A general form is 127. This is called the excess -127
 X1. X2 X3 X4 X5 X6 X7  10 Y 1Y 2 format. Thus E’ is in the range 0  E’ 
Where Xi and Yi are decimal digits. 225. The end values of the range, 0 and
255 are used to represent special
(i) The number of significant digits (7) values. Therefore, the range of E’ for
8
normal values is 1  E’  254, that to 21023. The 53-bit mantissa
1022
means the actual exponent, E, is in the provides a precision equivalent to

range -126  E  127. about 16 decimal digits.
 The last 23 bits represent the mantissa.  The double precision format
The most significant bit of the mantissa hasincreased exponent and mantissa
is always equal to 1 because binary range.
normalization is used. This bit is not  The 11-bit excess-1023exponent E’ has
explicitly represented; it is assumed to the range 1  E’  2046 for normal
be the immediate left of the binary values, with 0 and 2047 used to indicate
point. special values.
Hence, the 23-bits stored in M field  Actual exponent E is the range -1022 
actually represent the fractional part of E  1023, providing scale factors of 2-
the mantissa, i.e. the bits to the right of 1022 to 21023. The 53-bit mantissa
the binary point. The following figure provides a precision equivalent to

shows an example of a single precision about 16 decimal digits.
floating-point.
1.10.2 Basic aspects of operating with
floating-point numbers:
 If a number is not normalized, then put

in normalized form by shifting the
 The 32-bit standard representation in fraction and adjusting the exponent.
figure is called a single-precision The following figure shows an
representation because it occupies a normalized value 0.0010110…  29 and
single 32-bit word. its normalized version, 1.0110…..  26.
 Scale factor has a range or 2-126 to 2+127
 10 38
 The 24-bit mantissa provides  same
precision as a 7-digit decimal value.
 To provide more precision and range
for floating-point numbers, the IEEE
standard.
Also specifies a double-precision format
as shown in figure below.
Since, the scale factor is in the form 2i,

shifting the mantissa right or left by one
bit position is adjusted by an increase
or a decrease of 1 in the exponent.
 The double precision format has  As computations proceed, a number
increased exponent and mantissa range. that does not fall in the representable
 The 11-bit excess-1023exponent E’ has range of normal numbers might be
the range 1  E’  2046 for normal generated. In single precision, this
values, with special values. means that its normalized
 Actual exponent E is the range -1022  representation requires an exponent
E  1023, providing scale factors of 2- less than-126 or greater than +127.
1.10.3 Arithmetic operations on floating-
point numbers: 1.10.6 Precision Consideration
The rules for addition and subtraction can Prior to a floating point operation, the
be stated as follows: exponent and significant of each operand
Add/Subtract Rule are loaded into the ALU registers.
1. Choose the number with the smaller In case of significant  the length of the
exponent and shift its mantissa right a register is almost always greater than the
number of steps equal to the difference length of the significant plus and implied
in exponents. bit. The register contains additional bits,
2. Set the exponent of the result equal to called guard bits, which are used to pad out
the larger exponent. the right end of the significant with 0’s
3. Perform addition/subtraction on the
mantissas and determine the sign of the Example
result.
4. Normalize the resulting value, if
necessary. Multiplication and division
are somewhat easier than addition and
subtraction, in that no alignment of
mantissas is needed. Multiply Rule
1. Add the exponents and subtract 127. 1.10.7 Rounding
2. Multiply the mantissas and
determine the sign of the result.
A number of techniques have been
3. Normalize the resulting value, if
explored for performing rounding.
necessary. 1 Round to nearest The result is rounded to
the nearest represent
able number
2 Round toward + The result is rounded up
 toward plus infinity
3 Round toward - The result is rounded up
 down toward negative
infinity
4 Round toward 0 The result is rounded
toward zero
Note: Round to nearest is the default

rounding mode listed in the standard.
1.10.4 Divide Rule
1.10.8 IEEE STANDARD FOR BINARY
1. Subtract the exponents and add 127 FLOATING-POINT ARITHMETIC
2. Divide the mantissas and determine the  Infinity:
sign of the result. Infinity arithmetic is treated as the
3. Normalize the resulting value, if limiting case of real arithmetic, with the
necessary. infinity values given the following
The addition or subtraction of 127 in interpretation.
the multiply and divide rules results -  <(every finite number) <+ 
from using the excess-127 notation for For example
exponents. 6 + (+  )= + 
6 / (+  )= + 0
1.10.5 IEEE 754 Format Parameters 6  (+  )= + 
10
(+  )-(-  )= + 
 Quiet and Signaling NaNs Renormalized numbers are useful for
A Nan is a symbolic entity encoded in exponent under flow, therefore they are
floating-point format, of which there included in IEEE 754.
are two types:  When the exponent of the result
(i) Signaling becomes too small (a negative exponent
(ii)Quiet with too large a magnitude), the result
is demoralized by right shifting the
(i) Signaling fraction and incrementing the exponent
A signaling Nan signals an invalid for each shift, until the exponent is
operation exception whenever it within a represent able range.
appears as an operand.  The above figure explains the effect of
Signaling Nan’s affords values for the addition of renormalized numbers.
uninitialized variables and arithmetic The represent able numbers can be
like enhancement that are not the grouped into intervals of the form [2n,
subject of the standard. 2n+1]
(ii) Quiet  Within each such interval, the exponent
A quiet Nan’s propagates through portion of the number remains constant
almost every arithmetic operation while the fraction varies, producing a
without signaling an exception uniform spacing of represent able
numbers within the interval.
Note : Both types of Nan’s have the  As approaches towards zero, each
same general format: an exponent of all successive interval is half number of
ones and nonzero fraction. The actual representable numbers. Hence, the
bit pattern of the nonzero fraction is density of representable numbers
implementations dependent; the increases as we approach zero.
fraction values can be used distinguish  If only normalized numbers are used,
quiet Nan’s from signaling Nan’s and to there is a gap between the smallest
specify particular exception conditions. normalized number and 0. In case of
32-bit IEEE 754 format, there are 223
1.10.9 Table: Operations that Produce a represent able numbers in each
Quiet NaN interval, and the smallest represent able
positive number is 2-126. With the
Operation Quiet NaN Produced by addition of demoralized numbers, an
Any Any operation on a signaling additional 223 number are uniformly
NaN added between o and 2-126
Add or subtract Magnitude subtraction of  Without demoralized numbers, the gap
infinities:
between the smallest representable
(-  ) +(-  )
(-  ) +(+  ) nonzero number and zero is much
(+  ) –(+  ) wider than the gap between the
(-  ) –(-  ) smallest representable nonzero number
Multiply 0  and the next larger number.
Division 0   In case of demoralized numbers is
or referred to as gradual underflow.
0 
Gradual underflow fills in the gap and
Remainder x REM 0 or  REM y reduces the impact of exponent
Square root x where x < 0 underflow to a level comparable with
round off among the normalized
1.10.10 Demoralized Numbers numbers.
11
12
2 INTRODUCTIONS
2.1 INTRODUCTION CYCLE In addition to these registers, processor

contains two registers, used in
 The basic function of a processor is to instruction cycle as shown in following
execute instructions. These instructions figure.
are stored sequentially in the memory.
For execution of the instruction, 1. Address register
processor fetches it from memory and  It is used to provide the address of a
executes stepwise. memory location.
 The sequence of operations required
 This address is obtained by address
to implement instruction processing
register which is used to read/write
is called as instruction cycle.
data from/ to memory.
 The sequence of operations required to
implement instruction is divided into
2. Data register
two cycle.
 Data register contains data to be
written/read to/ from memory.
2.1.1 Sequence of Operation in

Instruction Cycle
 The processor contains two registers

used for fetch and execute operations.
a) Program Counter (PC):
 Points to the instruction to be executed
and its contents are used as address
and instruction is read from memory.
 The program counter contents are
incremented each time to point to the
next instruction, to maintain instruction
sequencing.
 During branch operations the branch
address is loaded in program counter,
so the next instruction fetched will be
from branch address.
b) Instruction Register (IR) :
 The instruction read from memory by
using program counter is transferred to 2.1.2 Instruction Cycle State Diagram
instructions register. Execution cycle for a particular instruction
 The instruction register contains the may involve more than one reference to
instruction to be executed. The output memory. For any given instruction cycle,
of instruction register provided for some states may be null and others may be
control circuits. visited more than once. In figure shown
 The control circuit generates the signals below, on first line three circles
required to execute the instruction. (operations) represent the CPU access to
13
memory of I/O. And on second line five 2.3 INSTRUCTION FORMATS
circles (operations) represent the internal
CPU operations. The states can be A program consists of a sequence of
described as: instruction, each one specifying some
 Instruction Address Calculation particular action.
(iod): Determine the address of the
next instruction to be executed. • Typical Instruction Formats
 Instruction Fetch (if): Read
instruction: from its memory location Addressing Modes
into the CPU.
 Instruction Operation Decoding (iod) : 1. Immediate Addressing
Analyze instruction to determine type
of operation to be performed and
operand (s) to be used.
 Operand Address Calculation (oac):
If the operation involves reference to an Description :
operand in memory or variable via I/O
 The operand is directly specified in the
then determine the address of the
operand field.
operand.
 The instruction is a multiword
 Operand Fetch (of) :
instruction, where the operands
Fetch the operand from memory or
immediately follow the op code.
read it in from I/O.
 Both the op code and the operand are
 Data Operation (do):
fetched from memory using program
Perform the operation indicated in the
counter.
instruction.
Use of the immediate addressing modes
 Operand Store (os): → Loading internal registers with initial
Write the result into memory or out to value.
I/O. → Perform arithmetic or logical
operation on immediate data
2. Direct Addressing
2.2 ADDRESSING MODES Description :

 The effective memory address where
 The addressing mode refers to the the operand is present is directly
effective address formation specified within the instruction.
mechanisms. Addressing modes are  The instruction will contain op code
either explicitly specified or implied by followed by direct memory address.
the instruction. Both the op code and direct address are
14
fetched from memory by using program 5. Indirect Addressing
counter
 The direct address available is then
used to access the operand.
3. Extended Addressing
Description :
 In this indirect addressing mode, the
instruction contains an address that
points to the memory location where
Description : the effective direct address to be used
 The effective memory address is for operand is stored.
directly specified with the instruction. It 6. Register Indirect format
uses 16 bits address.
 This addressing is slow way of
accessing memory because the
instruction is 3 bytes long and requires
3 memory accesses using the PC to
acquire the instruction.
4. Register Addressing
Description :
 In this, the instruction opcode specifies
an internal registers or register pair
which contains the effective address to
be used for accessing operand in
memory.
 This mode is used to save program
space and improve speed of program
Description : execution in situations where data
 In this, the instruction op code specifies elements are to be accessed from
the CPU registers where the operand is memory.
stored.
 Two way of implementation 7. Base Addressing
→ When two registers are specified one
will
be used as source while the other will
be
used as destination.
→ Using internal registers instead of
memory for operand makes this mode
instructions execute faster than other
mode instructions.
15
 In this, the opcode specifies a register 10. Relative Addressing
that contains an address. The
instruction also contains an offset field
that contains a displacement
 The effective address is formed by
addition of the base address and the
displacement value.
8. Indexed Addressing
Description :
 In this, the operand comes from a
location relative to the executed
instruction position.
 The operand effective address = contents
of the program counter + the signal value
specified by the instruction in its address
field.
 In this, the op code specifies a register INSTRUCTION FORMATS

that contains the offset value or
displacement. The instruction also A program consists of a sequence of
contains an address field. instruction, each one specifying some
 The effective address is formed by particular action.
addition of the index value and the
address.
9. Base Index Addressing
Typical Instruction Formats
1. Three address instructions
Example:
ADD R1, A, B
Description :
Processor with three address instruction
Description : format can use each address field to specify
 This is the combination of two modes i.e., either a processor register of a memory
Base addressing and index addressing. operand
Also in this, the instruction op code Advantage
specifies two register that contains base Three-address format is that it result in
address and an index register that short programs when evaluating arithmetic
contain an index value. operations.
 This mode instruction can used 8-bit or Disadvantage
16-bit displacement as option. If Binary coded instructions require too many
displacement is used, it is also added to bits to specify the operands.
get effective address.
16
2. Two-address instructions  A (k+1) bit op code and an (n-1) bit
Format: address gives more operations, but the
OPCODE ADDRESS1 ADDRESS2 price is either a smaller number of cells
addressable or poorer solutions and the
Example: same amount of memory addressable.
MOVR1, A
Description : 2.4 INSTRUCION INTERPRETATION
 In this, each address field can specify
either a processor register or a memory  Instruction interpretation is use for
operand. activating the control signals that
 The first symbol listed in an instruction cause the data processing unit to
is assumed to be both a source and execute the instruction. The control
destination where the result of signals are transmitted from the
operation is transferred. control unit to the outside through
control lines.
3. One-address instructions
Format:
OPCODE ADDRESS 1
Example:
LOAD A
Description :
 One address instructions use an implied
accumulator register for all data  Control Specification
manipulation.
 All operations are done between the The four groups of control signals
accumulator and a memory operand. have following functions.
No. Signal Description
4. Zero-address instructions 1. C’out These signals directly control the
operation of the data processing
Format:
unit. The main function of the
OPCODE control unit is to generate C’out
Example: 2. C’in These signals enable the data
ADD being processed to influence the
MUL control unit, allowing data
dependent decision to be made. A
Description : important function of C’in is to be
A stack organized computer does not use indicate the occurrence of
unusual conditions such as errors
an address field for the instruction in the data processing unit.
3. C”out These signals are transmitted to
Expanding Op codes other control units and may
indicate status conditions such as
 Consider an (n+k) bit instruction with a “busy” or operation completed.
k-bit opcode and a single n-bit address. 4. C”in These signals are received from
This instruction allows 2k different other control units. They typically
include start and stop signals and
operations and 2n addressable memory
timing information.
cell or the same n +k bits would be C”in and C”out are primarily used to
broken up into a (k-1) bit code and an synchronize the control unit with
(n+1) bit address. the operation of other control
unit.
17
2.4.1 IMPLEMENTATION METHODS (I) The number of state and input
combinations may be so large that the
Hardwired Control state-table size and the amount of
In this approach, we design the control computations needed become
units that use fixed logic circuits to excessive.
interpret instructions and generate control (II)State table tend to conceal useful
signals from them. information about a circuit’s behavior.
Design Methods For example, the existence of repeated
The design of Hardwired control unit patterns or loops.
involves various complex tradeoffs (III)Control circuits designed from state
between the amount of hardware used, its tables also tend to have a random
speed of operation and the cost of the structure, which makes design,
design process itself. debugging and maintenance of the
We consider three simplified and circuit difficult.
systematic approaches to the design of
hardwired controllers.
State-table Method
It is standard algorithmic approach to
sequential circuit design.
The behavior required like control unit of
any finite-state sequential machine can be
represented by a state table as shown in
figure below.
 Let Cin and Cout denote the input and out
variables of the control unit.
 The rows of the state table  set
ofinternal states {Si}of the machine.
 The column of the state table  set of 2.4.3 Delay-Element Method
external signals to the control unit
 The entry in row Si and column 11 has Control unit using delays elements can be
the form Si,j, Zi,j constructed directly form a flowchart that
Where Si,j denotes the next state of the specifies the control-signal sequences
control unit and Zi,j denotes the output required. Consider the problem of
signals Zi,j from Cout that are activated by generating the following sequence of
the application of I to the control unit when control signals at times t1, t2…..tn using a
it is state Sj. hardwired control unit.
t1 : Activate {C1,j};
t2 : Activate {C2,j};
______________________________
tn : Activate {Cn,j};
An initiation signal called START (t1) is

available at t1. START (t1) may be turned
out to {C1,j} to perform the first micro
operation.
If START (t1) is also entered into a time
2.4.2Disadvantages of State-table Method delay element of delay t2 –t1, the output of
that circuit, START (t2) can be used to
18
activate {C2,j}. Similarly another delay  Consider the circuit shown in figure. It
element of delay t1-t2 with input START (t2) consists basically of modulo-k counter
can be used to activate (C3,j) and so on. whose output is connected to 1/k
Thus control signals can be generated by clocked decoder. If the count enable
sing delay elements. input is connected to clock source, the
 To ensure synchronous operation, the counter cycles continually through its k-
delay elements are implemented by D- states.
flip flop and controlled by a common The decoder generates k pulse signals {
clock signal. Since normally only one  i } on its output lines.
flip flop is set or “hot” at any time and  Consecutive pulses are separated by
all other flip flops are reset, this one clock period as shown in figure.
approach is also called “one hot”
The{  i } effectively divide the time
method.
required for one completer cycle by the
Disadvantages counter into k equal parts.
 The number of delay elements needed  Two additional input lines and flip-flops
is approximately equal to the number of are provided for turning the counter on
states and each delay elements is a and off. A pulse on the begin line causes
sequential circuit of equal or greater the counter to begin cycling through its
complexity than a flip-flop. state by logically connecting the count
 The delay element approach produce enable line to the clock source.
expensive circuits in which timing is
controlled by pluses traveling through 2.5 MICROGRAM MED CONTROL
cascades of clocked delay elements.
 Synchronization of many widely  Microprogramming is a method of
distributed delay elements may also be control design in which the control
difficult. signal selection and sequencing
information is stored in a ROM or RAM
2.4.4 Sequence –Counter Method called a control memory (CM).
 Each microinstruction also explicitly or
implicitly specifies the next
microinstruction to be used by
providing the necessary information
sequencing.
 A set of related microinstructions is
called a micro program. In micro
programmed CPU, each machine
instruction is executed by a micro
program which acts as reat- time
interprets for the instructions.
2.6 WILKES DESIGN
The micro programmed control unit was

proposed by M.V.Wilkes.
Consider the following figure to illustrate
this scheme. The specific bit pattern which
is of two types i.e.
(i) Micro steps to be achieved
19
(ii) Address field to activate next controlled by external signal. The two
microinstruction to be executed required to possible conditions are used to activate
activate control signal two separate lines which provides two
separate addresses. These addresses
can be used to provide conditional
branching in microinstructions.
2.7 HORIZONTAL AND VERTICAL

MICROINSTRUCTIONS
Horizontal Microinstructions :
 The Horizontal  the existence of the
long control word that produces a
horizontal pattern of 1’s and 0’s.
Horizontal microinstructions are able to
control a variety of components
generating in parallel.
 A horizontal microinstruction may
initiate simultaneous independent
 The control memory is organized in microinstructions for many registers, for
matrix form i.e. Rows and columns. a memory read or memory write
Rows  micro-instruction operation and for the generation of the
Column  micro-step or address of next next address, all in the microinstruction.
instruction Advantage
Each row to be activated is decided by Efficient hardware utilization
decoder and at a time only one output Disadvantage
line will be active. Control memory becomes expensive.
 The input to decoder is given by control
memory address register (CMAR)  In a control word, the number of
CMAR contents  the current control bits can be reduced by grouping
microinstruction address used to mutually exclusive variables into fields
generate microinstruction. and encoding the k-bits in each field to
The CMAR decides the address of provide 2k micro-operations.
microinstruction by using reference of 
(i) External address source  gives the 2.7.1 Vertical Microinstructions
starting address of microprogram
stored in the control memory.  A microinstruction format which is not a
(ii) Address given by 3 column line. horizontal is called vertical
 The first microinstruction is activated microinstruction.
which will provide micro steps and the It requires external decoding circuits
address of next microinstruction to be external to the control memory.
activated. This address is accepted by  The term ‘vertical’ implies that the
CMAR and used to activate the next encoding of fields necessitates decoding
microinstruction. circuits that form a vertical pattern,
 This scheme also provides the facility of which may consist of one or two levels
using externals or condition codes. This of decoding.
is provided by switches. The switch is
activated when a row is active and is 2.7.2 Difference between Hardwired
control and Micro-programmed control
20
Hardwired control Micro programmed Control left end of A and M accommodates the
Speed Comparatively Comparatively sign bit during subtractions. Algorithm
fast slow
Control Implemented implemented in
System in hardware software
Flexibility More flexible, Not flexible to
to accommodate
accommodate new system
new system specification or
specifications new
or new instructions for
instructions that redesign is
required
Ability to Ability to handle
handle large complex
large/complex instruction sets
instruction sets is easier
somewhat
difficult
Ability to Ability to
support support
operating operating
systems and system and
diagnostic diagnostic
features are features are
very difficult easier
Design process Design process
is somewhat is orderly and
complicated systematic.
Applications Used in RISC Used in
microprocessor mainframes and
some
microprocessor
Instruction size Instruction size
usually under usually over 100
100 instruction.
instructions
Chip Area Uses less area Uses more area
Efficiency
 The following figure shows a logic

circuit arrangement that implements
this restoring-division technique.
 An n-bit positive divisor is loaded into
registers M and an n-bit positive
dividend is loaded into registers Q at
time start of the position.
 Register A is set to 0. After the division
is complete, the n-bit quotient is in
register Q and the remainder is in
register A.
 The required subtractions are
facilitated by using 2’s-complement
arithmetic. The extra bit position at the
21
3 MEMORY ORGANIZATION
3.1 INTRODUCTION computer system can be divided into three

main groups according to their use. These
 Programs and the data they operate on are as follows:
are held in the memory of the
computer. The execution speed of  Internal Processor Memory:
programs is highly dependent on the
speed with which the instructions and This comprises a small set of high speed
data can be transferred between the registers used as a working memory for
processor and the memory. Ideally, the temporary storage of instructions and data.
memory would be fast, large and
inexpensive but it is impossible to meet  Main Memory:
all three together. It is also called as primary memory. This is
 The maximum size of memory that can a relatively large and is a fast memory used
be used in any computer is determined for program and data storage during
by the addressing scheme. For example: computer operation. It is characterized by
A 16-bit computer that generates 16-bit the fact that the location in the main
addresses is capable of addressing upto memory can be accessed directly and
216=64 K memory locations. In same rapidly by the CPU instruction set. The
way, machines whose instructions technology used for main memory is based
generate 32-bit address can utilize a on semiconductor integrated circuits.
memory that contains up to 232=4G
(giga) memory locations.  Secondary Memory:
 The memories can be classified on It is also known as auxiliary or backing
various bases. The memory components memory. This is generally much larger in
of a computer system can be divided capacity but also much slower than main
into three main groups according to memory. It is used for storing system
their use. These are follows: programs, large data files and the
Internal Process A small set of high speed information or data which are not
Memory registers used as a working continually required by the CPU. It also
memory for temporary
storage of instruction and
serves as an overflow memory when the
data. capacity of the main memory is exceeded.
Main Memory This is relatively larger and Information in secondary storage is
faster memory used for accessed indirectly via input-output
program and data storage programs that first transfer the required
during computer operating.
information to main memory.
Secondary This is also known as Representative technologies used for
Memory auxiliary or backup secondary memory are magnetic disks and
memory, much larger but tapes.
slower in speed.
 Memory hierarchy:
3.2 MEMORY HIERARCHY A computer’s memory units form a
hierarchy of different memory types in
Types of Memories which each member is in some sense
The memories can be classified on various subordinate to the next-highest member of
bases. The memory components of a the hierarchy.
22
Consider a general n-level system of n
memory types (M1, M2….Mn). Fig (1) shows
some examples with n = 2, 3 and 4. Typical
technologies used in these hierarchies are
semiconductor SRAM’s for cache memory,
semiconductor DRAM’s for main memory
and magnetic disk units for secondary
memory. The two level hierarchy of fig (1)-
a is typical of early computer. Fig (1)-b
adds a cache of a type called split cache
since it has separate areas for storing
instructions (the I-cache) and data (the D-
cache). The third example Fig (1)-c has two During program execution the CPU
cache levels, both of the non-split or unified produces a steady stream of memory
type. Embedded microcontrollers also use addresses. At any time, these addresses are
the various hierarchical organizations but distributed in some fashion throughout the
often lack the secondary or the cache level. memory hierarchy. If an address is
The following relations normally hold generated that is currently assigned only to
between adjacent memory levels Mi, and Mi where i  1, the address must be
Mi+1 in a memory hierarchy. reassigned to M1, the level of the memory
hierarchy that the CPU can access directly.
Cost per bit C1 > Ci+1 This relocation of addresses involves the
Access time tAi < tAi+1 transfer of data between levels Mi and Mi-1,
Storage capacity S1 < Si+1 a relatively slow process. For a memory
hierarchy to work efficiently, the addresses
The difference in cost, access time and generated by the CPU should be found in
capacity between Mi and Mi+1 can be M1 as often as possible can be transferred
several orders of magnitude. Considerable to M1 before it is actually used by the CPU.
system resources are devoted to shielding If the desired data cannot be found in M1,
the CPU from these differences so it almost then the program originating the memory
always sees a very large and inexpensive request must be suspended until an
memory space and rarely see an access appropriate reallocation of storage is made.
time greater than that of M1, the first level Main and secondary memory form another
of the memory hierarchy. (figure 1(b) two level sub hierarchy. This
The CPU and other processors can interaction is managed by the operating
communicate directly with M1 only, M1 can system, however and so is not transparent
communicate with M2, and so on. to system software, although it is
Consequently, for the CPU to read somewhat transparent to the user code.
information held in some memory level Mi
requires a sequence of i data transfer of the Example
form, Consider a two level memory hierarchy m1
Mi1:  Mi ;Mi2 :  Mi1;Mi3:  Mi 2 ;CPU:  M1 and m2 and let C1 and C2 be the cost per
An exception is allowed in the case of byte, t1 and t2 be the access times and S1
caches, the CPU is designed to bypass the and S2 be the memory capacities for m1 and
cache levels and go directly to main m2 respectively.
memory. In general, all the information a) Under what conditions will the average
stored in Mi at any time is also stored in cost of the entire memory system
Mi+1, but not vice versa. approach C2.
23
b) what is the effective memory access  In figure below, E is plotted as a
time ta of this hierarchy. function of H. This graph shows the
c) Express access efficiency E in terms of importance of achieving high values of
speed ratio and hit ratio. H in order to make E  1, i.e., ta  t1.
d) Plot E against H for r = 520 and 100
respectively and comment on this
performance.
We consider a two level memory hierarchy
(m1, m2) The average cost per bit of
memory is given by:
C S  C2S2
C 1 1
S1  S2
 To achieve the goal of making C

approach C2, S1 must be very small
compared to S2.  Memory capacity is limited by cost
The hit ratio H is defined as the consideration. The efficiency with
probability that a logical address which space is being used at any time
generated by the CPU refers to the can be loosely defined as the ratio of
information stored in M1. H should be memory space Su occupied by active
as close to 1 as possible. parts of user program to the total
amount of memory space available ‘S’.
 The number of address references
This space utilization U is given as U= Su/s.
satisfied by M1 and M2 are denoted by
 Since main memory space is more
n1 and n2 respectively.
valuable than secondary memory space,
n1
H  it is useful to restrict u for measuring
n1  n 2 main memory space utilization. In that
(1-H) is the miss ratio ‘t1’ and ‘t2’ are the case, the S-Su words of M1 which
access times of ‘M1’ and ‘M2’ represent control space can be attribute
respectively. The average time ‘ta’ for to several resources. Access efficiency
the CPU to access a word in memory ‘E’ of two level memory system as a
system is given by, function of hit ratio ‘H’ for various
ta = H  t1 + (1-H) t2 ……….(1) values is given as : r = t2/t1.
 The time tb required for the blocks
transfer is called block-replacement or Example
block transfer time, we have,
ta = t1 + (1-H) tb Consider 2-level memory hierarchy of the
Block transfer requires a relatively slow of the (m1, m2), where m1 is directly
IO operation; therefore, tb is usually connected to the CPU. Determine the
much greater than t1. Hence, t2 >> t1 and average cost per bit C and average access
t2  tb. Let r = t2/t1denote the access- time ta for data given below:
time ratio of the two leads of memory.
Level (I) Capacity Cost Access Hit Ratio (t)
Let E be the access efficiency of the (Si) (ei) time
virtual memory. (tA)
E = t1/ta, which is the factor by which ta M1 (cache) 1024 0.100 10-8 0.9000
differs from the minimum possible 0
value from equation (1), we obtain: M2 (main) 216 0.010 10-6 0.9999
0
1
E  Solution
r  (1  r)H
24
C1S1  C2S2 storage locations can be accessed only in a
C=
S1  S2 certain predetermined sequence.
0.1000 1024  0.0100  2
= 4) Alterability : Memories whose contents
210  216 cannot be altered on line are called Read
210 (0.1  0.01 26 ) Only Memories (ROMs). Memories in which
=
210 (1  26 ) reading or writing can be done online are
0.74 called read write memories. All memories
= = 0.01138 used for temporary purpose are read write
65
 Avg. cost/bit. memories.
ta = Ht1 + (1-H) t2 = 0.9  10-8 + 0.0001  10-6
= 10-6 [0.009 + 0.0001] = 0.0091  10-6 sec. 5) Permanence of Storage : The physical
processes involved in storage are
3.3 MEMORY CHARACTERISTICS sometimes inherently unstable, so that
stored information may be lost over a
The properties to be considered when period of time unless appropriate action is
evaluating any memory technology are: taken. There are three important memory
characteristics that can destroy
1) Cost : The price should include the cost
information: destructive readout, dynamic
of information storage cells as well as the
storage and volatility. In destructive
cost of the peripheral equipment or access
readout, the memory contents are
circuitry essential for the operation of
destroyed (erased) as the memory is read.
memory.
Memories which require periodic
cost = price of complete memory
refreshing are called as dynamic memories.
system/total bits of storage capacity.
Static memories do not require refreshing.
If the contents of memory are lost in case of
2) Access Time : It is the time required to
power failure, the memory is termed as
read or write a fixed amount of
volatile memory.
information, e.g. one word from the
6) Cycle Time and Data Transfer Rate :
memory. Access time depends upon the
The minimum time that must elapse
physical characteristics of the storage
between the initiations of two different
medium and also on the types of access
memory accesses can be greater than
mechanism used. It is usually calculated
access time, this loosely defined term is
from the time a read request is received by
called cycle time of the memory. It is
the memory unit to the time a read request
generally convenient to assume that cycle
is made available to the memory output
time is the time needed to complete any
terminals. The access time measured in
read or write operation in memory.
words per second is another widely used
The maximum amount of data that can be
performance measure for storage devices.
transferred is 1/tm and is called data
Thus, low cost and high access rates are
transfer rate. The access time may be more
desirable memory characteristics.
important in measuring overall computer
system performance since it determines
3) Access Modes : It is the order or
the length of time of processor must wait
sequence in which information can be
unit initiating a next memory request.
accessed. Memory can be accessed
7) Physical Characteristics : Many
randomly or sequentially. In random access
different physical properties of matter are
memories each storage location can be
used for information storage. The most
accessed independently of the other
important properties used for this purpose,
locations whereas in serial access memory
are classified as electronic, magnetic,
25
mechanical and optical. A factor information and store it in cells of
determining the physical size of a memory the selected word.
unit is the storage density measured in bits  The organization shown above in the
per unit area. In general, memories with no fig. 2 is an example of a very small
moving parts have much higher reliability memory chip consisting of 16 wards of
than memories such as magnetic disks 8 bits each, referred as a 16*8
which involves considerable mechanical organization.
motion.  The data input and the data output
of each sense/write circuit is
3.4 SEMICONCUCTOR RAM MEMORIES connected to a single bidirectional
data line.
Semiconductor memories are available in a  Two control lines, R/W and CS, are
wide range of speeds. Their cycle times provided in addition to address and
range from 100 ns to less than 10 ns. data lines.
The R/W input specifies the
3.4.1 Internal Organization of Memory
required operation, and the CS input
Chips
selects a given chip in a multi chip
 Consider the following memory memory system.
organization, which is organized in the  The circuit shown in fig.2 above stores
form of an array and each cell is capable 128 bits and requires 14 external
of storing one bit of information. connections for address, data and
control lines.
Semiconductor memories may be
divided into bipolar and MOS (metal-
oxide semiconductor) types.
 Semiconductor RAM : In
semiconductor memories, the basic
storage cells are transistor circuits. The
Semiconductor memories fall into two
main categories, static and dynamic.
 Static RAM : These RAMs are
 Each row forms a memory word, composed of memory cells that
and all cells of a row are connected resemble the flip-flop used in processor
to a common line called the word registers. In a dynamic RAM cell, the 1
line, which is driven by the address and 0 states correspond to the presence
decodes on the chip. or absence of a stored charge in a
 In each column, the cells are capacitor controlled by a transistor
connected to a sense/write circuit switching circuit. Since a dynamic RAM
by two bit lines. The sense/write cell can be constructed around a single
circuit are connected to the data transistor, where as a static cell
input/output lines of the chip. requires upto six transistors, higher
 During read operation ⟹ these storage density is achieved with
circuits sense, or read the information dynamic RAM design. Consequently,
stored in the cells selected by a dynamic RAMs are more difficult to use
word line and transmit this than static RAMs. Unlike the ferrite
information to the output data lines. cores, semiconductor memories both
 During write operation ⟹ the static and dynamic, are volatile so that
sense/write circuit receives input the stored information is lost when the
power source is removed.
26
3.4.2 Random Access Memory (RAM) Various features of RAM organization are:
 The storage cells are physically
arranged as rectangular arrays of cells.
 The memory address is punched into
the components, so that, the address Ai
of cell Ci becomes a d-dimension vector
(A1,1, A1,2…….A  1.d) = A  .
 Random access memory (RAMs) are

characterized by the fact that every
location can be accessed independently.
The access and cycle times for every
location are constant and independent
of its position.
 Fig.6 shows the main components of a
random access memory unit.
The storage cell unit comprises N cell
each of which can store 2 bit of
information.
The memory operates as follows:
 The address of the required location is 3.4.2.2 DESIGNING EXAMPLES
transferred via the address bus to the
memory addresses register. Example: Designs a 16 K byte RAM using
256  4 bit RAM ICs.
 The address is then processed by the
Solution
address decoder which selects the
required location in the storage cell
unit.
 A read-write select control line specifies
the type of access to be performed. If
read is requested the contents of the
selected location is transferred to the
output data register. If write is
requested, the word to be written in
first placed in the memory Input data
register and then transferred to the
selected cell.
3.4.2.1 RAM Organization
Example :
Design a 4M  16 memory unit using 256K 
1 memory chips. Explain in detail the
assumptions made while designing the
system.
27
Solution substantially less access circuitry than
the 1-D for a fixed amount of storage.
3.4.3 READ-ONLY MEMORIES

There are many applications that need
3.4.2.3 2-D Memory Organization memory devices which retain the stored
information if power is turned off. But both
 The access circuitry needed has a very SRAM and DRAM chips are volatile i.e. they
significant effect on the total cost of any lose the stored information if power is turned
memory unit. A general approach to off.
reduce one access circuitry cost in
random-access memories is called
matrix.
 In this, the memory address is
partitioned into ‘d’ components, so that
the address Ai of cell Ci becomes a d-
dimensional memory’.
 A particular cell is selected by 1. ROM
simultaneously activating all ‘d’ of its  It is a nonvolatile memory, and its
address lines. A memory unit with this normal operation involves only reading
kind of addressing is said to be a ‘d- of stored data, a memory of this type is
dimensional memory’. called read only memory (RAM)
 In two-dimensional (2-D) organization  Fig. 9 shows a ROM Cell.
the address field is divided into two  If a transistor is grounded at point P
components called X and Y, which then a logic value 0 is stored in the cell;
consist of ‘ax’ and ‘ay’ bits respectively otherwise 1 is stored. Through a
as shown in Fig. (8) as shown below. register to the power supply, the bit line
 The cells are arranged in a rectangular is connected. By activating the word
array of Nx <= 2ax rows and Ny <= 2ay line, the state of the cell is activated.
columns so that, the total number of Thus, the transistor switch is closed and
cells is N = Nx. Ny. the voltage on the bit line drops to near
 A cell is selected by the coincidence of zero if there is a connection between
signals on its ‘X’ and ‘Y’ address lines. the transistor and ground. If no
The 2-d organization requires connection to ground, the bit line
remains at a high voltage, indicating1. A
28
sense circuit at the end of the bit line  Disadvantage: If requires different
generates the power output value. voltages for erasing, writing and
2. PROM reading the stored data.
 Some designs allow the data to be
loaded by the user by providing a 3.4.4 CACHE MEMORY ARCHITECTURE
programmable ROM (PROM). This is AND WORKING
achieved by inserting a fuse at point P
in Fig. 9. Cache memory is positioned logically
 Before programming, the memory between the CPU and main memory.
contains all 0’s. The programmer can A cache’s storage capacity is less than that
insert 1’s at the required locations by of main memory, but an access time of 1 to
burning out the fuses at these locations 3 cycles. Hence, cache is much faster than
using high current pulses and this main memory because the some or all of it
process is irreversible. can reside on the same IC as CPU, as cache
 It is more flexible and convenient than is small in size.
ROMs. Because they can be Cache are essential components of high
programmed directly by the user they performance computers that aim to make
are also more beneficial cost wise when CPU wait time  1 compared to other 3
required in less number as compare to memories, cache is transparent to user.
ROMs.
3.4.4.1 Memory Hierarchy
3. EPROM
 It is an erasable, reprogrammable ROM
i.e. we can erase stored data and new
data can be loaded. It provides
flexibility while designing digital
system.
 The structure of an EPROM cell is
similar to the ROM as shown in Fig.9,
but in an EPROM cell, the connection to
the ground is always made at point P 3.4.4.2 Duplication:
and a special transistor is used, which
has the ability to function either as a In the memory hierarchy, the duplication of
normal transistor or as a disable whatever is in the lower level is always
transistor that is always turned off. present in higher level i.e., whatever
 Advantage : EPROM contents can be present in main memory is always in cache
erased and reprogrammed memory.
 Disadvantage : A chip must be For the user, the secondary memory acts as
physically removed from the circuit for memory but for CPU it will act as I/O
reprogramming and that its entire device.
contents are erased by the ultraviolet
light. 3.4.4.3 Cache Organization
4. EEPROM  Memory words are stored in the “cache

 It is programmed and erased data memory” and are grouped into
electrically and do not have to be small pages. Thus, the contents of the
removed for erasure. cache’s data memory are a copy of a set
 It is possible to erase the cell contents of amine Memory work. Each cache is
selectively. marked with its block address, referred
29
to as a ‘tag’, so cache knows to what  But if match of Ai is not found in the
pair of memory space the block belongs. cache, then cache miss occurs and then
Ai is matched with main memory M2
address.
 In response to cache miss, the block Bi
of address Aj is transferred from the M2
to M1, i.e, copied from main to cache.
The tag addresses contain, (that are
currently assigned to cache which can (ii) Look through Cache
be non-continuous is stored in a special  In this method, the CPU communicates
memory) the ‘Cache tag memory’ or with the cache memory via a separate
directory. bus that is isolated from main system
bus. With a look through cache the CPU
Example : does not automatically send all requests
If Bj is block containing Dj data in M1. to main memory. It is possible only
Then, Bj is in cache tag memory and Dj when cache miss occurs.
is in cache data memory.
 To improve the performance of the
computer, the cache memory is used.
Hence, the access time of cache should
greater than main memory. Therefore
of main memory is implemented with
DRAM technology having on access time
tA1 = 50 ns, then cache might be
implemented with an SRAM technology
having an access time tA2 = 10 ns.
3.4.5 Types of Cache Thus, in cache memory, each time the

updating is up to cache memory level
until the cache miss occurs and when
(I) Cache Look Aside
cache miss occurs only then is main
memory is updated.
 Write policies: During write operation,

when something is written to the cache,
Main memory is simultaneously not
updated. So when main memory block
Overrides the caches block, then our old
 In this type, the cache and the main data of cache is destroyed. To prevent
memory are directly connected to the this, we use write policies.
system by CPU starts by placing a
address Ai on the memory address bus 1. Write through policy :
at start of read (load) and write cycle. For every write operation, write data to
 The cache M1 immediately compares Ai the cache and also to the main memory.
to the tag address currently residing in So no problem of loss of memory, i.e.
tag memory. update each time upto main memory
 If a match is found in M1, then cache hit level.
occurs and main memory is not
involved. 2. Write back policy:
30
For execution of instructions, update logic interprets these s bits as a tag of s-
each time upto cache level. r bits (most significant)
But, when write instruction executes  A line field of r bits.
only then the main memory is updated
by copying the cache block to main Address length (s + w) bits
memory and transferring main memory No. of addressable units 2s+w words or
bytes
block to cache memory.
Block size = line size 2w words or
bytes
 The write back policy can be overcome No. of blocks in main 2s+w/2w
by a ‘dirty bit’ or ‘modified bit’. memory
If dirty bit = 1, then required block is No. of lines in cache = m 2r
present in memory and write that block Size of tag (s—r) bits
to memory. If dirty bit = 0, the block is
clear and no need to copy in main  With associative mapping, there is
memory. flexibility as to which block to replace
when a new block is read into the cache
3.4.6 MAPPING METHODS  Disadvantage
Associative mapping requires complex
Mapping is needed between main memory circuitry to examine the tags of all cache
blocks and cache lines since there are fewer lines in parallel.
cache lines than main memory blocks. Also,
a means is needed for determining which (ii) Associative Mapping
main memory block currently occupies a  It overcomes the disadvantage of direct
cache line. mapping by permitting each main
Three techniques can be used: memory block to be loaded into any line
(i) Direct of the cache.
(ii) Associative  The cache control logic interprets the
(iii) Set associative memory address simply as a tag and
word field. The tag field uniquely
(i) Direct Mapping identifies a block of main memory.
 It maps each block of main memory into
only one possible cache line, and Note : To determine whether a block is in
mapping is expressed as the cache, the cache control logic must
i = j modulo m simultaneously examine every line’s tag for
where, a match. No field in the address
i = cache line number corresponds to line number, so that the
j = main memory block number number of lines in the cache is not
m = number of lines in the cache determined by the address format.
 The mapping function is easily Address length (s + w) bits
No. of addressable units 2s+w words or bytes
implemented using the address. For
Block size = line size 2w words or bytes
cache access, each main memory No. of blocks in main memory 2s+w/2w = 2s
address can be viewed as consisting of No. of lines in cache = m undetermined
three fields. Size of tag s bits
 The least significant w bits identify a
unique word or byte within a block of  With associative mapping, there is
main memory. flexibility as to which block to replace
 The remaining s bits specify one of the when a new block is read into the cache.
2s blocks of main memory. The cache
31
Disadvantage  It significantly improves the hit ratio
Associative mapping requires complex over direct mapping. Four-way set
circuitry to examine the tags of all cache associative (v = m/4, k = 4) makes a
lines in parallel. modest additional improvement for
a relatively small additional cost.
(iii) Set Associative Mapping Further increase in the number of
 It is a compromise that exhibits the lines per set has little effect.
strengths of both the direct and
associative approaches while reducing  There are possible three mapping
their disadvantages. methods to specify where main
 In this, the cache is divided into v sets, memory are placed in cache:
each of which consists of k lines.  Direct mapping method
m = v*k  Associative mapping method
Where, i = j modulo v  Block-set associative mapping method
j = main memory block number To discuss mapping methods, consider
m = number of lines in the cache a cache consisting of 128 blocks of 16
This is referred to as k-way set words each for a total of 2048 (2K)
associative mapping. words and assume that the main
 With set associative mapping, block Bj memory is addressable by a 16-bit
can be mapped into any of the lines of address for mapping purpose, the main
set i. In this case, the cache control logic memory will be viewed as composed of
interprets a memory address simply as 4k blocks.
three fields tag, set, word.
 The d set bits specify one of v = 2d sets. 3.4.6.1 Direct Mapping
The a bits of the tag and set fields
specify one of the 2s blocks of main
memory. With k-way set associative
mapping, the tag in a memory address
is much smaller and is only compared
to the k tags within a single set.
Address length (s + w) bits

No. of addressable units 2s+w words or
bytes
Block size = line size 2w words or bytes
No. of blocks in main memory 2s+w/2w = 2s
No. of lines in set K
No. of sets v 2d
No. of lines in cache Kv = K  2d  This is the simplest way to associate
Size of flag (s-d) bits main memory (MM) works with cache
blocks. In this technique, block K of
 In the extreme case of v = m, k = 1, the main memory maps onto block K
set associative technique reduces to modulo 128 of the cache. Here more
direct mapping and for v = 1, k = m, it than one main memory block is mapped
reduces to associative mapping. onto a given cache block position.
The use of two lines per set (v = m/2, k  As main memory address can be
= 2) is the most common set associative divided into three fields as shown in
organization. figure below. When a new block enters
the cache, the 7-bit cache block field
32
determines the cache position in which 3.4.6.3Set-Associative Mapping Technique
this block must be stored. The high
order five bits of the main memory  This is a combination of the two
address of the block are stored in five techniques, direct mapping and
tag bits associated with its location in associative-mapping. Here blocks of the
the cache. cache all grouped into sets and mapping
 As execution process’s the 7-bit cache allows a block of main memory to
block field of each address generated by reside in any block of a specific set.
the CPU points to a particular block
location in the cache.
 The tag field of that block is compared
to the tag field of the address if they
match, then the desired word is in that
block of the cache.
 If there is no match, then the block
containing the required word must first
be read from the main memory and
loaded into the cache. The direct
mapping method is easy to implement,
but it is not very flexible.
3.4.6.2 Associative Mapping
 The associative mapping technique is  In this, the hardware cost is reduced by

much more flexible method in which a decreasing the size of the associative
main memory block can potentially search.
reside in any cache block position. An example of block-set associative is
 In this case, 12 tag bits are required to shown in the following figure for a
identify a MM block when it is resident cache with two blocks per set.
in the cache. The tag hits of an address  Here, the 6-bit set field of the address
received from the CPU are compared to determines which set of the cache might
the tag bits of each block of the cache to contain the desired block. The tag field
see if the desired block is present. must be associatively compared to the
 It’s cost of implementation is higher stages of the two blocks of the set to
than the cost of the direct mapping check if the desired block is present.
scheme because of the need of search
all 128 tag patterns to determine Replacement Algorithms
whether a given block is in the cache. A  When a new block is to be brought into
search of this kind is called an the cache and all the positions that it
associative search. may occupy are full, the cache
controller must decide which of the old
blocks to overwrite. In a direct mapped
cache, the position of each block is
predetermined, therefore, no
replacement strategy required. But in
associative and set associative cache
there exists some flexibility.
 In general, the aim is to keep blocks in
the cache that are likely to be
33
referenced but determining which
blocks are about to be referenced is  If the main memory of a computer is
difficult, Since programs usually resides structured as a collection of physically
in localized areas for reasonable separate module, each with its own
periods of time, there is a high address buffer register (ABR) and data
probability that blocks that have been buffer register (DBR), then memory
referenced recently will be referenced access operations may proceed in more
again. than one-module at the same time.
 When a block is to be overwritten, it is
easy to overwrite the one that has gone
the longest time without being
referenced. This block is called the least
recently used (LRU) block and the
technique is called the LRU replacement
algorithm.
 To perform LRU function, the cache
controller must record references to all
blocks as computation progresses.
 Time type of memory is known as
For Example : multiple-module memory. Here, the
It is required to record the LRU blocks of average rate of transmission of words
four-blocks set in a set-associative cache. A to end from the total main memory
2-bit counter can be used for each block. system can by increased. In multiple
When a hit occurs, the counter of the block module memory the modules can be
that is referenced is set to 0. Counters with addressed in two ways.
values originally lower than the referenced  In first method, the main memory
one are incremented by one, and all others address generated by the CPU is
remain unchanged. When a miss occurs and decoded as shown in Fig.
the set is not full, the counter associated
with the new block loaded from the main
memory is set to 0, and the values of the
other counters are increased by one. When
a miss occurs and the set is full, the block
with the counter value 3 is removed, the
new block is put in its place, and its counter
is set to 0. The other three block counters
are incremented by one. It can be easily
verified that the counter values of occupied
blocks are always distinct  In figure 16, high order K-bits name one
of n modules and low order m-bits
 The LRU algorithm has been used
name a particular word in that module
extensively. Although it performs well
of the CPU issues Read requests to
for many access patterns, it can lead to
consecutive location, as it does when
poor performance in some cases.
fetching instructions of a state line
Performance of the LRU algorithm can
program, then only one module is kept
be improved by introducing a small
busy by one CPU. However, the devices
amount of randomness in deciding
with direct memory access (DMA)
which block to replace.
ability may be accessing information in
other memory modules.
3.4.7 Memory Interleaving
34
 The second and more effective method given to IC and decoder, this interleaved
to address the module is shown in Fig. memory is divided into two types.
17. In this, the low order K-bits of the
main memory address select a module
and high order m-bits name a location
within that module. In this way,
consecutive addresses are located in
successive modules. This is called
memory interleaving.
 In this, any component of the system
that, generates requests for access to
consecutive main memory locations can
keep a number of modules busy at any
given time. This results in a higher
average utilization of the memory
system as a cache.
 To take advantage of memory (i) Low-order Interleaving Memory
interleaving, the CPU or the DMA device
must be capable of initiating a memory In this memory, low order of odd lines are
access operation while waiting for a given to 8K RAM IC while CPU gives three
previous memory access to be higher order address lines to ports of
completed. decoder which selects any of the RAM.
 In memory interleaving there must be When A13 A14 A15  000 then 1st chip IC is
2k modules; otherwise, there will be selected and memory location in that
gaps of non-existent locations in the particular IC is selected by A0 = A12 lines.
main memory address space. In this
first method, an existing system can be (ii) Higher-Order Interleaved Memory
expanded simply by adding one o more
modules as required. In this memory system, the higher order
 But in second method, the system must address lines i.e., A3- A15 are given to IC and
always have the full set of 2k modules low order address lines are given to
and a failure in any module effects all decoder from CPU to select IC.
areas of the address space. A failed
module in the first system affects only a
localized area of the address space.
3.4.8 Example of Low order and High

order interleaving
In an interleaved memory system, the CPU

selects any of the RAM modules by using CS
depending on the decoder. The input for
3:8 decoder from CPU are three address
lines and remaining are address lines are
all for the RAM. The output of decoder
3.5 VIRTUAL MEMORY TECHNOLOGY
given to CS of each RAM. For 64 K byte, we
require eight 8K byte RAM IC. But
 The physical main memory is not as
depending on the order of lines that is
large as the processor actually thinks.
35
When a program does not completely fit called pages. Each page consists of a
into the main memory, the parts of its block of word that occupies continuous
not currently in the main memory or locations in the main memory. Page
the parts of it not currently being length is normally 2K to 10K bytes.
executed are stored in secondary Virtual memory address translation is
storage devices. The techniques that based on fixed length pages. Each
automatically get more program and virtual address is generated by the
data blocks into the physical main processor. It contains the virtual page
memory when they are required for number of offset. Information about the
execution are called virtual memory main memory location of each page is
techniques. The binary addresses that kept in a page table. An area in the main
the processor issues for either memory that can hold one page is called
instruction or data are called virtual or a page-frame. The starting address of
logical addresses. These addresses are the page table is kept in a page table
translated into physical addresses by a base register. By adding the virtual page
combination of hardware and software number to the content of this register,
components. the address of the corresponding entry
 Figure 18 shows the typical in the page table is obtained. Each entry
organization that implements virtual in the page table also includes some
memory. Memory management unit control bits that describes the status of
(MMU) is a hardware device that page while it is in main memory. The
translates virtual addresses into page table information is used by the
physical addresses. MMU for every read and write access. A
small cache, usually called the
Translation Look-aside Buffer (TLB) in
incorporated into the MMU. It consists
of the page table entries that
correspond to the most recently
accessed pages.
3.6 Advantages of using Virtual Memory
 Virtual memory is the separation of

user logical memory from physical
memory. This separation allows an
extremely large virtual memory to be
provided for programmers when only a
smaller physical memory is available./
 Virtual memory makes the task of
 When data are in the main memory, programming much easier, because the
these data are fetched to cache memory. programs no longer needs to worry
If data are not in main memory, the about the amount of physical memory
MMU causes the disk storage to bring available or about what code can be
the data into the memory form disk. placed in overlays, but can concentrate
Transfer of data between main memory instead on the problem to be
and disk is preformed using the DMA programmed.
scheme.  On systems which support virtual
 It is assumed that all programs and data memory, overlays have virtually
are composed of fixed length units disappeared.
36
 Virtual memory is commonly  Translation of physical space is
implemented by demand paging. It can  p.n 
also be implemented in a segmentation  
system with several systems provides a
paged segmentation scheme, where page address displacement
segments are broken into pages.
Demand segmentation can also be used The memory map, now referred to as a
to provide virtual memory. ‘page table’ consists of the following
information.
3.7 Paging, Segmentation and Paged Page Page Presence change Access
Segments add frame bit p bit C
A 000000 1 0 R 1X
Main and the secondary memory form C 06C7F9 0 1 R1W1X
another two level hierarchy. This
interaction is managed by operating  Each virtual page address has
system. However, it is not transparent to corresponding real address of a page
system software but somewhat transparent frame in main or secondary memory.
to the user code. The term ‘virtual memory’ When presence bit p = 1, required page
is applied when main and a secondary in main memory and base address of
memory appears in user program like a page frame is stored in page table. If p =
single, large and directly addressing 0, a page fault occurs. The change bit C
memory. indicates whether or not the page has
 Three reasons for using virtual been changed since it was last loaded
memory: into main memory.
 To free user from the need to carry out  Page table can also contain memory
storage reallocation and permit the protection data specifies access rights
efficient sharing of available memory of current program to read from, writer
space by the different users. into or execute page.
 To make the program independent of  Since, page frames are contiguous, no
the configuration and capacity of the external fragmentation exists in paging.
physical memory for execution. But, if K-word block is divided into P, n-
 To achieve the very low cost per bit and word pages and K is not multiple of n,
low access time that are possible with the page frame to which the block is
memory hierarchy. assigned will not be filled. Unusable
The program is divided into number of space within the partially filled page
blocks of virtual memory which is frame is ‘internal fragmentation’.
known as ‘virtual address space’.
Advantages
 A page is a fixed length block which can
 The chief advantage of paging is that
be assigned to fixed regions of physical
data transfer between memory levels is
memory called as ‘page frames’.
simplified an incoming page can be
Division of physical memory space into
assigned to any available page frame.
equal size blocks is called ‘page frame’.
Size of page frame is equal to size of  No external fragmentation problem, as
page. Dividing the virtual address space page frames are contiguous.
into equal sized blocks in known as  Paging is hidden from the user.
‘Paging’.  This can be used in multiprogramming.
 In pure paging system, each virtual Disadvantages
address consists of two parts: a page  Protection facility is not available in
address (no) and a displacement. paging.
37
 Internal fragmentation is problem in length pages is shown in Fig. 19. Each
paging. virtual address generated by the
processor, whether it is for an
3.7.1 Process of address translation in instruction fetch or an operand
a Virtual fetch/store operation is interpreted as
virtual page number (high order bits)
Memory System within a page. Information about the
 A simplest method for translating a main memory location of each page is
virtual address into a physical address kept in a page table. This information
is to assume that all programs and data includes the main memory address
are compared of fixed-length units where the page is stored and the
called pages, each of which consists of a current status of the page. An area in
block of words that occupy contiguous the main memory that can hold one
locations in the main memory. page is called a page frame.
 Pages commonly range from 2K to The starting address of the page table is
16K bytes in length. They constitute kept in a page table base register. By
the basic unit of information that is adding the virtual page number to the
moved between the main memory and contents of this register, the address of
the disk whenever the translation the corresponding entry in the page
mechanism determines that a move is table is obtained. The contents of this
required. Pages should not be too small, location give the starting address of the
because the access time of a magnetic page if that page currently resides in
disk is much longer (10 or 20 the main memory.
millisecond) than the access time of the  Each entry in the page table also
main memory. includes some control bits that describe
 The reasons for this is, that it takes a the status of the page while it is in the
considerable amount of time to locate main memory. One bit indicate the
the data on the disk, but once located, validity of the page, i.e., whether the
the data can be transferred at a rate of page is actually loaded in the main
several megabytes per second. memory. This bit allows the operating
On the other hand, if pages are too system to invalidate the page without
large, it is possible that a substantial actually removing it. Another bit
portion of a page may not be used, yet indicates whether the page has been
this unnecessary data will occupy modified during its residency in the
valuable space in the main memory. memory. Other control bits indicate
various restrictions that may be
imposed on accessing the page.
Example
A virtual memory system has a 16K word
logical address space, 8k word physical
address space with page size of 2k word.
The page address trace of a program has
been found to be:
7 5 3 2 1 0 4 1 6 7 4 2 0 1 3 5
List the four pages resident in the memory
after each page reference for the following
 A virtual memory address translation replacement policies:
method based on the concept of fixed (i) FIFO (ii) LRU
38
Solution
7 5 3 2 1 0 4 1 6 7 4 2 0 1 3 5
(i) FIFO
7 7 7 7 1 1 1 1 1 1 1 2 2 2 2 5
*
5 5 5 5 0 0 0 0 7 7 7 7 1 1 1
3 3 3 3 4 4 4 4 4 4 4 4 4 3
*
2 2 2 2 2 6 6 6 6 0 0 0 0
 The O/S maintains a page table for each
(ii) LRU process. The page table shows the
frame locations for each page of the
7 7 7 7 1 1 1 1* 1 1 2 2 2 2 5 process. Each logical address consists of
5 5 5 5 0 0 0 0 7 7 7 1 1 1 a page number and a relative address
3 3 3 3 4 4 4 4* 4 4 4 3 3 within the page. In paging, the logical to
2 2 2 2 2 6 6 6 0 0 0 0
physical address translation is done by
CPU hardware.
3.7.2 Implementation Methods
 Now the CPU must know how to access
the page table of the current process.
i. Paged Memory System
Presented with a logical address
ii. Demand Paged Memory System
consisting of page number, relative
address. The CPU uses the page table to
i. Paged Memory System
produce a physical address consisting
 Paging system uses fixed length blocks
of frame number and relative address
called pages and assign them to fixed
as shown in figure below.
regions of physical memory called page-
frames. The main advantage of paging is
that memory allocation is greatly
simplified since an incoming page can
be assigned to any available page frame
Physical memory is broken into fixed-
size blocks called frames.
 Logical memory is also broken into
blocks of the same size called pages.
When a program is to be executed, its
pages are loaded into any available
frames and the page table is defined to
map user pages to memory frames.
 In the Fig. 20(a), some of the frames in  Hence, paging overcomes a lot of
memory are in use and some are free. problems. Main memory is divided into
The lists of free frames are maintained many small equal size frames. Each
by the operating system. Process A, process is divided into frame sized
stored on disk, consists of four pages. pages. Smaller process requires lesser
When it comes to load this process the number of pages, large process requires
O/S finds four free frames and loads the more. When a process is brought in, its
four pages of the process A into the four pages are loaded into available frames
frames. and a page table is set up.
39
ii. Demand Paging  If the process tries to use a page that
A demand paging system is similar to a was not brought into memory. Access to
paging system with swapping which is a page marked invalid causes a page
shown in figure. fault swap. The paging hardware will
notice that the invalid bit is set, causing
a trap to the operating system. This trap
is the result of the operating system’s
failure to bring the desired page into
memory.
 In figure swapping of a paged memory

to contiguous disk block processor
residing on secondary memory is
shown. When we want to execute a
process, we swap it into memory.
Rather than swapping the entire
process into memory.
 However, we use a large swapper. A
large swapper never swaps a page into
memory unless that page will be needed
when a process is to be swapped in.
 Instead of swapping in a whole process,
the swapper brings only the necessary
pages into the memory. Thus, it avoids
reading into memory pages into the
memory. Thus, it avoids reading into
memory pages that will not be used 3.8 SECONDARY MEMORY TECHNOLOGY
anyway, decreasing the swap time and
the amount of physical memory needed. Every word in main memory is directly
We need some form of hardware accessible in a very short time, main
support to distinguish between those memory is relatively expensive. Secondary
pages that are in the memory and those memories are used to hold large sets of
pages that are on the disk. The valid- data.
invalid bit can be used for this purpose.
 When this bit is set to ‘valid’, it indicates 3.8.1 Magnetic Tape
that the associated page is both legal
and in memory. If the bit is set to  Magnetic tape was the first kind of
‘invalid’, it indicates that the page is secondary memory. A computer tape
either not valid or is valid but corrected drive is analogous to home tape
on disk. recorder; a 2400 ft long tape is wound
 The page table entry for a page that is from the feed reel. By varying the
brought into memory is set as usual but current in the recording head, the
the page-table entry for a page that is computer can write information on the
not correctly in memory is simply tape in the form of little magnetized
marked invalid which is shown in spots.
figure.  Magnetic tapes are sequential-access
devices. If the tape is positioned at the
40
beginning, to read a physical record, helps to transfer information between
through n-1, one at a time. If the main memory and the disk.
information desired is real the end of
the tape the program will have to read
almost the entire tape which may take
several minutes, forcing a CPU that can
execute millions of instructions per
second, to wait 200 sec while a tape is
advanced, is wasteful. Tapes are most
appropriate when the data must be
accessed sequentially.
3.8.2 Magnetic Disks  To specify a transfer, the program must

provide the following information the
 A disk is a piece of metal about the size cylinder and surface, which together
and shape of an LP photograph record, specify a unique track, the sector
to which a magnetizable coating has number where the information starts,
been applied at the factory. Disks the number of words to be transmitted,
typically have a few hundred marks per the main memory address where the
surface. information comes from or goes to and
 Each drive has a movable head that can whether information is to be lead from
be moved closer to or farther from the the disk or written onto it.
centre. The head is small enough to  Disk transfer always start at the
read or write information from exactly beginning of a sector, never in the
one track. A disk drive often has several middle of it when a multi-sector
disks slacked vertically about an inch transfer crosses a track boundary
apart. within a cylinder, no time is lost, but if it
crosses a cylinder boundary, one
rotation time is lost on account of the
seek.
3.8.3 Magnetic Drums
 A variation on the disk is the drum, a

cylinder on which information can be
recorded magnetically. Along the length
 The radial position of heads is called the of the drum are many fixed read/write
cylinder address. A disk drive with a heads. Each head can read or write one
platters will have 2n surfaces hence 2n track.
tracks per cylinder. Tracks are divided  The tracks are again divided into
into sectors, normally between 10 and sectors. Because the heads do not move,
100 sectors per track. there is no seek time. Furthermore
 A sector consists of a certain number of several heads may be reading or writing
machine words, typically 32 to 256. On in parallel. Drums have a smaller
some disks the number of sectors per capacity than disks but access is much
track can be set by the program. faster as there is no seek time. Some
 Each disk drive has a small special drums have two or more sets of heads,
purpose computer associated with it, spaced uniformly around the
called the disk controller. The controller circumference of the drum.
41
 With two sets of heads, a given sector disk has been written, it cannot be
will always appear under one set of erased as a magnetic disk can be.
heads or either within at most one-half
of the rotation period. Fixed head disks Example
are logically smaller to drums in that A Winchester magnetic disk unit has
the heads do not move but have the densities 40  106 bits per square inch of
physical appearance of a disk. surface.
i) If the inner diameter of recording is 4
inches and the outer diameter is 7
inches.
What is the average bit density along a
track if radial track spacing density is
2000 t racks/inch.
ii) What is the data transfer rate in
bytes/sec at a rotational speed of 3600
rpm?
Solution
Given :
Number of bits per square inch (density) =
40  106
 Fixed disks are often combined with
Inner diameter= 4 inches
removable ones, the fixed disk for
Outer diameter = 7 inches
normal use and the removable one for
The total recording area= area (outer
making backups. Drums are never
circle) - area (inner circle)
combined with anything else in one
=  /4 (72-42)
device.
= 25.90 sq.inches.
Total number of bits = bits
3.8.4 Optical Memories
density*recording area
= 40  106 *25.90
 Optical memories have become
= 1036*106 bits
available. They have much higher
Track density = 2000 tracks/inch
recording densities than conventional
 Total number of tracks = 2000 (outer
magnetic media.
diameter-inner diameter)
For example : A strip of ordinary 35
= 2000 (7-4) = 6000.
mm black and white film 3 feet long can
Average bit density along a track = total
hold more information than a 2400 ft
number of bits/ total number of tracks
magnetic tape.
= (1036  106) /6000
 An especially interesting optical = 173  103 bits per track
memory is the video disk. Although Rotational speed= 3600 rpm = 60 rp/sec
these disks were originally developed  Data transfer rate = 1056*106 bits/60
for recording television programs, they = 17.26*106 bits/sec.
can be put to more esthetic use as = 2.15  106 bytes/sec.
compute storage devices. = 2.15 MB/sec.
 The disks are inherently digital, with Ans:
the information recorded as a i) Average bit density along a track= 173 
sequences of bits burned into the 103 bits/track.
surface by an electron beam or laser ii) Data transfer rate= 2.15 MB/sec.
one characteristic, however, that limits
their application is that once a video Example
42
A high speed tape system accommodates
2400 ft reel of standard of 9-track tape. The
tape is moved fast the recording heat at the
rate of 150 inches/sec.
i) What must be the linear tape recording
density in order to achieve a data
transfer of 106bit/sec.
ii) If tape is organized into blocks of 32KB,
a gap of 0.4 inches separates the blocks,
what is the storage capacity of the tape.
Solution
Given : Data transfer rate = 106 bit/sec.
Tape speed (in inches/second)
transfer rate (bytes / sec)
=
Re cording density (bps)
= 150 inches/ sec
106 bits / sec

Re cording density bits / inch
 Recording density (bpi)
6
10 bits / sec

150 inches / sec
Recording density
106
= = 6667 bpi
150
Recording density= 6667 bpi.
Total storage length= total length – length
wasted in gaps.
= 2400 (12-0.4)
= 27840 inches
Storage capacity = storage
length*recording density
= (27840 *6667/8 bytes) = 23.2 MB.
43
4 INPUT AND OUTPUT UNIT
4.1 I/O MAPPING/ ADDRESSING  Memory area proportionately reduced

METHODS as I/O space increases.
In this, ALU operations may be
Here we will define the addressing
technique for the I/O devices. performed directly on port data.
These are:  The address lines of the system bus that
1. Memory mapped I/O are used to select memory locations. It
2. I/O Mapped I/O can also be used to select I/O devices.
 An I/O device is connected to the bus
4.1.1 MEMORY MAPPED I/O. via an I/O port, which from the CPU’s
perspective. (An addressable data
 It is nothing but programmed I/O with register).
shared memory and I/O address space.  A technique used in many machines,
Programmed I/O requires all I/O such as Motorola 680X0 series, is to
operations to be executed under the assign a part of main memory address
direct control of the CPU i.e. every data- space to support I/O ports. This
transfer operation involving an I/O technique is called memory mapped
device requires the execution of an I/O.
instruction by the CPU.
 The I/O device does not have direct 4.1.2 I/O MAPPED I/O
access to main memory ‘M’. A data
transfer from an I/O device to ‘M’ The organization is as follows:
(memory) requires the CPU to execute
several instructions. It also includes an
input instruction to transfer a word
from the I/O device to the CPU and a
store instruction to transfer the word
from the CPU to memory.
 The memory and I/O address space are

separate.
 This scheme is used for Intel 80  86
microprocessor series.
 I/O address space is always shared with  A memory referencing instruction
memory address space. activate the READ ‘M’ or WRITE ‘M’
 Same control signals are used for control line which does not affect the
memory as well as I/O control. These I/O devices.
are  The CPU must execute separate I/O
i) Memory read ii) memory writes instruction to activate the READ I/O
 All memory related instructions can be and WRITE I/O lines.
used for I/O operations.  It causes a word to be transferred
 I/O data transfer can be done with between the addressed I/O port and the
respect to any general purpose register. CPU.
44
 An IO device and a memory location can instructions which are used to initiate and
have the same address bit pattern terminate the execution of IO program. It is
without conflict. used to test the status of IO device. IO
 I/O mapped I/O defines separate I/O processor initialized to executes IO data
address space and memory address transfer operations. IPO like 8089 provided
space. with two DMA channels which are used for
 It uses separate control signals for data transfer operation using memory bus
memory and I/O devices. These are – when CPU does not require it.
memory read, memory write, I/O-read, The algorithm is as follows:
I/O-write.
 It uses dedicated instructions for I/O WAIT :
operations. e.g. : IN, OUT I/O data If Attention = 1 then begin Fetch
transfer is always with respect to parameters from IOCR.
accumulator (a register) only.
 Address memory area is not reduced in SET UP :
this case. Setup DMA control register. Begin IO
 ALU operations cannot be directly program execution sends command to IO
performed on port data. device
4.1.3 INTERRUPT BASED I/O SEND :

Transmit data word If transmission error
Advantages then EXIT.
1. CPU cannot stay in program loop. If not end of data items then SEND.
2. It increases the operation speed. If not end of IO program then SETUP.
3. It provides external asynchronous input
to processor. EXIT :
Disadvantages Place termination status in IOCR. End GO
1. Implementation cost is high TO WAIT.
2. Because of priority, low priority
interrupt execution takes time. The general interface between IOP and
CPU is as shown below:
4.1.4 PROGRAM I/O
Advantages
1. Data transfer controls by software.
2. I/O device does not have direct access to
memory.
Disadvantages:
1. This method is useful in small-low speed
computers only.
2. The CPU is wasting time while checking
the flag.
4.2 IOP (IO PROCESSOR)
Once interfaced to the main CPU, the IO

processor will relive the main CPU from Typical IO processor 8089 organization
executing time consuming IO data transfer is as shown below:
operations. CPU executes only few IO
45
4.3 DIRECT MEMORY ACCESS
The hardware required to design Direct
Memory Access is as shown below:
 IOP control unit contains two control

signals: (i) Attention, (ii) Channel select.
 When the IO operations are to be
initiated the CPU gives the attention
signal to the IOP. The channel select
signal is used to select a particular DMA  In this structure of DMA, the both CPU
channel. and DMA controller have access to main
 Each DMA channel is identical which memory via a shared system bus having
contains a set of registers: Program data, address and control lines.
counter (PC). IO address Register  The I/O devices are connected to the
(IOAR), Data count (DC) and set of system bus for transferring data via a
miscellaneous registers. special interface circuit shown as “DMA
These registers are programmed by the controller”.
CPU by executing IO operations.  It contains a data register (IODR), an
address register (IOAR) and a data
 When the data transfer operation via a counter register (DC) which enables the
particular DMA channel is over, it issues DMA controller to transfer data to or
interrupt request to the CPU indicating from the different regions of main
that data transfer operation is memory. The IOAR register of DMA
completed. contains the base address of the
 In order to gain the control of memory memory region where transfer
bus by IO processor it uses normal DMA operation is to be done. This register is
related signals like DMA request and automatically decremented or
DMA acknowledge. incremented after each word is
 BUS Interface Unit is used for transferred.
connection of channels with external  The data counters (DC) contain the
buses. number of words that remain to be
 A 20 bit ALU is used for address transferred. This counter value is
translation. automatically decremented after each
 Sometimes data transfer is to be transfer of word and tested for zero.
performed between 8 bit memory Which this DC reaches to zero, the DMA
device to 16 bit IO device or vice versa. controller stop the DMA transfer.
 The data should be properly formatted  It normally provides with the interrupt
prior to the start of the data transfer. capability. It can send interrupt to CPU
This operation is performed by to indicate the end of data transfer.
Assembly / Disassembly unit.  DMA transfer can be done in two ways.
46
4.3.1 DMA TRANSFER BLOCK activate the DMA
ACKNOWLEDGEMENT signal.
In DMA mode, DMA controller is the master 3. The DMA controller now transfers data,
and controls the memory bus. This mode is to or from the main memory. After a
needed by the secondary memories like word is transferred it updates DC and
disk drives, that have data transmission IOAR registers.
and are not to be stopped or slowed 4. If DC is not yet zero and I/O device is not
without any loss of data and transfer of ready to send or receive the data, then
blocks. Block DMA transfer, supports faster the DMA controller release the system
I/O data transfer rates, but the CPU bus to CPU by deactivating the
remains inactive for relatively long period REQUEST line CPU responds to the DMA
by teeing up the system bus. controller by deactivating the DMA
acknowledgement line.
4.3.2 CYCLE STEALING 5. If DC reaches to zero, then DMA
 This is an alternative method for DMA controller should stop the transfer and
block transfer. In this method, system send interrupt request signal to CPU;
allows DMA controller to use system CPU responds by halting the I/O device
bus to transfer one word, after which it or by initiating a DMA transfer
should return back control of bus to 4.3.4 NEED OF DMA
CPU.
 Block of I/O data transferred using A modest increase in hardware enables an
cycle stealing method have the DMA I/O device to transfer block of information
controller bus transactions, inter to or form ‘M’ (memory) without CPU
related with CPU bus transactions. intervention. For enabling this task I/O
 This method reduces maximum I/O device should have to generate memory
transfer rates. If also reduces the address and transfer data to or from the
interference of DMA controller in CPU bus CPU initiating each block transfer.
memory access. Hence I/O device should require an
 It is completely eliminated by designing interface between I/O data and main
DMA interface, so that the system bus memory that can carry out transfer without
cycles are stolen only when the CPU is program execution of CPU. Such I/O device
not actually using system bus. This is interface circuit is called DMA controller
called as “Transparent DMA” and level of I/O channel is called Direct
Memory Access (DMA) without CPU
4.3.3 STEPS INVOLVED IN DMA intervention.
SYSTEMS ARE
4.3.5 WORKING OF DMA & ITS
1. The CPU executes two I/O instructions, BENEFITS
which load the DMA register IOAR and
DR with their initial values. The IOAR is DMA comes into action due to following
loaded with base address of the region drawbacks of interrupt driven I/O and
of memory used for transfer. The DC simple programmed I/O.
will contain number of words to be
transferred to or from the memory. Drawbacks
2. When a DMA controller is ready to 1. Data Transfer must traverse a path
transmit or receive the data, it will through a CPU.
activate the DMA request interrupt 2. I/O transfer rate is limited by the speed
signal to the CPU. The CPU will wait for with which the CPU can test and service
the next DMA breakpoint, then it will a device
47
3. CPU is tied up in managing and I/O 4.3.8 DMA Data Transfer
transfer, a number of instructions must
be executed for each I/O transfer.
When large volume of data are to be
moved, a more efficient technique is
required i.e. nothing but DMA.
4.3.6 DMA FUNCTION
DMA has additional module on the system
bus called “DMA module”, which is capable
of taking over control of the system bus
from the CPU.  It is the data transfer technique, directly
between memory and I/O device
4.3.7 Working without CPU intervention.
1. When CPU reads or writes a block of  Data is directly transferred between
data, it issues command to DMA memory and I/O devices under the
module, which includes the following supervision of extra hardware called as
information: DMA controller (like 8237).
a) Whether a read or write is  It is the fastest type of data transfer
requested technique among this parallel group.
b) The address of the I/O device  Initialization of the DMA controller is
involved done by loading memory address and
c) The starting location in memory to word count into the channel register of
read from or write to. the DMA controller to which the I/O
d) The number of words to be read or device is connected.
written.  The I/O device initiates DMA operation
2. CPU after sending above information by DMA request (DRQ) to the DMA
continues with its work. controller.
3. DMA module transfers the entire block
of data, one word at a time, directly to 4.4 STEPS INVOLVED IN THE DMA
or from memory, without going through OPERATION:
the CPU. 1. I/O device asserts DRQ signal.
4. When transfer is complete, the DMA 2. DMA controller sends HOLD signal to
module sends an interrupt signal to the the microprocessor.
CPU. 3.  p sends HLDA (acknowledgement
Thus, the CPU is involved only at the signal) back to DMA controller and DMA
beginning and end of the transfer. controller takes charge of the system
bus.
4. DMA controller give DMA acknowledge
(back) signal to the corresponding I/O
device.
5. Now, the DMA controller places memory
address on the address bus. It reads the
data bytes from the memory and
transfers it to I/O device.
6. DMA controller updates memory
address register and word county
register.
48
7. When the internal count become zero
the DMA controller now sets HOLD =-(it
is indication to  p that DMA operation
is over).
8. Now the processor regains the charge of
the system bus which continues with
the normal operation.
There are various types of DMA transfer
modes as follows:
1. Byte/Cycle stealing mode.
2. Burst/Demand mode.
3. Continuous/Block mode.
Example:
4.4.1 BYTE/CYCLE STEALING MODE File transfer between memory & printer. The
printer prints the buffer contents by its own
When the DMA controller takes charge of speed. Say, if the data file to be printed is 10
the system bus, 1 byte is transferred Kbytes and the internal printer buffer is 4
between memory and I/O device and Kbytes then demand mode DMA is
subsequently the system bus is given back to performed in 3 bursts (4 Kbytes, 4 Kbytes &
the microprocessor. This is possible by 2 Kbytes).
stealing CPU cycle when processor is not
using the system bus. 4.4.3 CONTINUOUS/BLOCK MODE DMA
This is also called Hidden DMA.
 The flowchart for byte mode DMA is as This DMA mode is used for the transfer
follows: between memory and fast I/O devices. For
 Byte mode DMA exhibits high system gaining control of the system bus the entire
performance by carrying two data block is transferred between memory
operations simultaneously. and fast I/O device. Once the entire block is
over the control of the system bus is given
back to  P.
The flow chart is as shown below:
4.4.2 BURST/DEMAND DMA 4.4.4 PROGRAMMED I/O TRANSFER

It is used for I/O devices that have high  Programmed I/O operations are the
speed data buffers, which is filled on result of I/O instructions written in the
demand basis. The flow chart demand computer program.
mode DMA is as shown below:
49
 In this method, CPU stays in a program  A DMA controller takes control over the
loop until the device indicates that it is bus to manage the transfer directly
ready for transfer of data. between the I/O device and memory.
 It is time consuming process and keeps  There are various transfer modes
the processor busy needlessly. possible
 This problem can be avoided using an
interrupt facility and special commands
to inform the interface to issue an
interrupt request signal, when the data
are available from the device.
 Transfer of data under programmed I/O
is between CPU and peripheral.
CPU initiates the transfer by swapping
the interface with the starting address.
i) Burst transfer mode:
In this, a block sequence consisting of a
number of memory words is transferred in
a continuous burst.
ii) Cycle stealing mode:
It allows the DMA controller to transfer one
data word at a time after which it must
return control of the bus to the CPU.
4.6 DATA TRANSFER TECHNIQUES
4.5 INTERRUPT-INITIATED I/O  Data transfer techniques are classified
Interrupt-Initiated I/O transfer mode uses among two broad categories
the interrupt facility. 1) Intra System (Implemented using
 In this transfer mode, the CPU parallel I/O).
constantly monitors the flag and inform 2) Intersystem (Implemented using
the computer when it is ready for serial I/O).
transfer.  Parallel I/O can be further classified
 In this transfer mode, CPU responds to into three categories:
the interrupt signals by storing the 1) Programmed/Polled I/O
return address from the program 2) Interrupt driven I/O
counter into a memory stack and then 3) DMA
control branches to a service routine  Similarly serial I/O can be classified in
that processes the required I/O two groups as per data format which is
transfer. used.
 There are two methods of choosing the 1) Asynchronous serial
branch address of the service routine 2) Synchronous serial
are victor interrupt and non-vector
interrupt. 4.6.1 PARALLEL DATA TRANSFER
4.5.1 DMA TRANSFER MODE

 In this, the peripheral device manages
the memory buses directly. It improves
the speed of transfer.
 During the transfer, the CPU is idle and
has no control of the memory bus.
50
 It is used for short distance  P time is wasted in polling the I/O
communication generally between device ready status. Generally I/O
various components in a serial devices are slower than memory or
computer system. Hence it is called processor so most of the processor
Intra System Communication. time is wasted in polling I/O
 It uses parallel data bus for devices.
communication usually 8/16/32 bits  Microprocessor is involved in the
depending on the  p which is used operation.
 In this data transfer takes place on 4.6.3 INTERRUPT DRIVEN DATA
byte/word/double word at a time TRANSFER
through parallel data bus.
 Parallel bus connects all components in  In this technique processor does not
a single computer system. check the ready status of that I/O
 The problem of cross talk/interference device.
arises as the distance between device  I/O device itself sends interrupt signal
increases. (INT) to the microprocessor, when it
 The cost of parallel bus also increases become ready for data transfer.
as the distance increases.
4.6.2 PROGRAMMED/POLLED DATA
TRANSFER
 Microprocessor executes a program for
a data transfer between any two devices
in the system.
 Considering transfer of data from
memory to I/O device.
 μp performs its ordinary data
processing task (Main program) when
the I/O device is not ready for data
transfer operation.
 When I/O device becomes ready, it
sends interrupt signal (INT) to μp. μp′s
 In this, every byte to be transferred is control is then transferred to the
taken from memory inside μp, then by Interrupt Service Routine (ISR).
executing an out instruction that byte is  It simply takes a data byte from
transferred to I/O device via I/O port. memory and transfer to ready I/O
 During transfer of every byte, processor device through I/O port by executing
also check the ready condition of the OUT instruction.
corresponding I/O device. If I/O device  The ISR transfers control to the main
is not ready then processor keeps on program by return operation (RET).
polling that I/O device until it become Again processor continues the
ready. execution of main program for a period
 When I/O device becomes ready for which I/O device is not ready.
processor completes that data transfer.  Again I/O device may become ready for
Advantages accepting the next data byte and
Implementation of parallel data transfer execute ISR for next data byte transfer
is very simple. and subsequently returns back to the
Disadvantages main program.
51
 Thus the processor executes main 1. Both interrupt based and DMA based
program and ISR in an interleaved data transfers are used for transferring
fashion. the data to or from I/O devices.
 Here, processor does not remain idle. 2. In both, I/O devices send a request to
get served by the processor.
3. In both, memory read as well as I/O
write operations can take place.
4. In both, Interrupt based and I/O based
data transfer can take place through
system buses only.
5. In both, microprocessor time is not
wasted at all in the data transfer.
Disadvantage
I/O device is to be connected to one of the
interrupt line of the processor.
Advantage
Microprocessor’s time is not wasted at all
for the data transfer operation.
4.6.4 DMA TRANSFER
Interrupt based DMA based data
data Transfer Transfer Interrupt are requested and acknowledged
1. In Interrupt based In DMA based data in much the same ways as DMA requests.
data Transfer the I/O transfer, the I/O devices However, an interrupt is not a request for
devices are directly send interrupt through
interrupted to the DMA controller. bus control; rather, it asks the CPU to begin
microprocessor by executing an interrupt service program.
(INT) command. The interrupt program performs tasks such
2. The processor is Processor is not getting
involved in the data involved in the process as initiating an IO operation or responding
transfer to off from of data transfer. to an error encountered by the IO device.
I/O device. The CPU transfers control to this program
3. The processor is not The processor looses the
losing the control control over system bus
is essentially the same way it transfers
over the system bus and DMA controller will control to a subroutine. The CPU responds
at any time. take charge over bus to interrupts only between instruction
until the data transfer
gets completed. It
cycles.
returns the system bus
to  P after completion 4.7 RESPONSIBILITIES OF I/O
of data transfer process. INTERFACE
4. In main program if There are various
any interrupt occurs, methods of DMA data  Input-output subsystem of a computer
it will execute transfer provides an efficient mode of
corresponding ISR -Byte/cycle stealing
and after that it will mode.
communication between the central
return to next line of -Burst/Demand mode system and outside environment.
the same program. -Continuous /Block Programs and data must be entered
mode.
5. No extra hardware Extra hardware required
into computer memory for processing
required in this called as DMA controller and result obtained from computation
method (like 8237) must be recorded or displayed.
There are few similarities between  Input devices attached to computer
Interrupt driven data transfer and DMA either online or off-line are called
based data transfer. peripherals.
52
 I/O interface contains logic for
4.7.1 INPUT-OUTPUT INTERFACE
performing functions of communication
between the peripheral and the bus.
 Input-output interface provides a
 The I/O system must have an interface
method for transferring information
internal to the computer and an I/O
between internal storage and external
interface external to the computer.
I/O devices.
 The major requirement, for an I/O
 Peripherals connected to a computer
system can be given as
need special communication links to
1) Control & timing
interface them with the central
2) CPU communication
processing unit.
3) Device communication
 The purpose of the communication 4) Data buffering
links is to resolve the difference that 5) Error detection
exist between the central computer and
each peripheral.
 Control and Timing is required to co-
 The major difference between CPU and ordinate the flow of traffic between
peripherals is as follows: external resource and external devices.
1) Peripherals are electromechanical
 CPU communication involves
and electromagnetic devices and
exchange of data between CPU & the
their manner of
I/O system over the data bus.
2) Operation is different from the
 Data buffering is required because of
operation of the CPU and memory.
difference in data transfer rate of CPU
3) Data transfer rate of peripherals is
and memory.
usually slower than the transfer rate
 The data coming from memory of CPU
of CPU.
are sent to I/O system buffer and then
4) Data codes and format in peripherals
send to the peripheral device at its own
differ from the word format in the
data rate.
CPU and memory.
 I/O system is also responsible for error
detection and for reporting errors to
 To resolve these differences, computer
the CPU.
system include special hardware
components between the CPU and
4.7.2 I/O INTERFACE CIRCUITS
peripherals to supervise and
synchronize all input and output
transfers. These components are called  The task of connecting an I/O device to
interface unit. a computer system is greatly simplified
by the use of standard IC’s variously
 Each device may have its own controller
known as I/O interface circuits.
that supervise the operation of a
particular mechanism in a peripheral.  The circuit allows I/O devices
connected to standard bus with
 I/O interface has two major roles as
minimum hardware or software.
follows
1) Interface to the CPU and memory via
4.8 IBM 370 I/O CHANNEL
the system bus.
2) Interface to one or more I/O devices  The I/O processor in the IBM 370 to
via the data links. computer is called I/O channel.
 Links to peripheral devices are used to  There are three types of channels
exchange control, status and data i) Multiplexer
between I/O system peripheral and the ii) selector and
bus. iii) block multiplexer.
53
 The multiplexer channel can be  The address channel responds to each
connected to a number of slow and I/O instructions and executes it.
medium speed devices and is capable of  The four condition codes [Processor
operating with number of I/O devices Status Word (PSW)] specifies whether
simultaneously. the channel or the device is busy,
 The selector channel is designed to whether it is operational or not,
handle one I/O operation at a time and whether interrupts are pending, if the
is normally used to control one high- I/O operations had started successfully.
speed device.  The status field identifies the state of
 The block multiplexer channel the device, channel and any errors that
combines the features of both the occurred during the transfer.
multiplexer and selector channels.  The format of the Channel Command
 It provides a connection to a number of Word is shown in
high speed devices, but the entire block  The data address field specifies the first
of data as compared to a multiplexer address of memory buffer and the count
channel, can transfer only one byte at a field gives the number of bytes involved
time. in the transfer.
 The command field specifies an I/O
operation and the flag bits provide
additional information for the channel.
 The command field corresponds to an
operation code that specifies basic
types of I/O operations.
1. Write : Transfer data from memory
to I/O device.
2. Read: Transfer data from I/O
device to memory.
3. Read backwards: Read magnetic tape
 The CPU communicates directly with with tape moving back word.
the channel through dedicated control 4. Control: Used to initiate and
lines and indirectly through reserved operation not involving transfer of
storage areas in memory. Above figure data.
shows the word formats associated 5. Sense : Informs the channel to
with the channel operation. transfer it channel status word to
 The I/O instruction format has three memory.
fields 6. Transfer in channel : Used for jump
(i) Operation code instruction.
(ii) Channel address and
(iii) Device address 4.8.1 BUS ARBITRATION
 The computer system may have a
number of channels and each is  Several master or slave units connected
assigned an address. Each channel may to a shared bus may request access to
be connected to several devices and the bus at the same time. A selection
each device is assigned an address. mechanism is called bus arbitration.
 The operation code specifies one of There are various types of arbitration
eight I/O instruction: start I/O, start I/O schemes:
fast release, test I/O, clear I/O, halt I/O,  Daisy chaining
halt device, test channel and store  Polling
channel identification.  Independent Requesting
54
4.8.2 DAISY CHAINING  In response to a signal on BUS
REQUEST, the bus controller proceeds
 This method involves three control to generate a sequence of numbers on
signals to which we assign the generic the poll-count lines.
names BUS, REQUEST, BUS GRANT and
 The priority of a bus unit is determined
BUS BUSY.
by the position of its address in the
 The bus controller respond to a BUS polling sequence. This sequence can be
REQUEST signal only if BUS BUSY is programmed if the poll count lines are
inactive. Receiving a BUS BUSY for the connected to a programmable register.
duration of its new bus activity. Hence selection priority can be altered
under software controls.
 The advantage of polling over daisy
chaining is that in polling a failure in
one unit need not affect other units.
 The flexibility is achieved at the cost of
more control lines. Also the number of
units that can share the bus is limited
by the addressing capability of the poll-
count lines.
 When the first unit requesting access to 4.10 INDEPENDENT REQUESTING
the bus receives BUS GRANT, it blocks
further propagation of that signal,  It has separate BUS REQUEST and BUS
activates BUS BUSY. It begins to use the GRANT lines for every units.
bus. When a non requesting unit
receives the BUS GRANT signal, it
forwards the signal to the next unit.
 If two units simultaneously request bus
controller, then the one that receives
BUS GRANT first gets access to the bus.
 Selection priority is therefore
determined by the order in which the
unit are linked by the BUS GRANT lines.
4.9 POLLING
 It provides the bus controller with
 In that scheme, polling replaces the BUS immediate identification of all
GRANT line of the daisy-chain method requesting units.
with a set of poll-count lines that are  It responds rapidly to request for bus
connected directly to all units on the access.
bus, as shown below:  The bus control unit determines
priority, which is programmable.
 The main drawback of bus control by
independent requests is the fact that
‘2n’ BUS REQUEST and BUS GRANT
lines must be connected to the BUS
controller in order to control ‘n’ devices.
 Daisy chaining requires two such lines,
while polling requires approximately
login lines.
55
4.11 LOCAL COMMUNICATION Example:
A simple time sharing network, as
 Local communication is also called bus shown in following figure, which
communication. The various processor connects many user terminals to a
level components (CPU,IOP, main remotely located computer via the
memory I/O devices) of a computer public telephone system.
system are interconnected by buses.
 Many bus organizations are possible.
Two very common types are shown as
follows:
 A device called multiplexer is a small

computer designed to connect the users
to a remote computer via a single long
distance path.
 The multiplexer time shares the long
distance line among the relatively slow
user terminals.
 It gives each user the impression that
he is directly connected to the remote
computer.
4.11.4 INTERCONNECTION STRUCTURE
 It is defined as a graph whose nodes
represent components of the system
such as processors memories, etc.
4.11.1 SINGLE SHARED BUS:  The edges represent physical
communication paths such as buses or
A system bus is shared by all components. buses or long distance transmission
At any time only two units can buses.
communicate via the system bus.  A path used to link only two devices is
said to be dedicated.
4.11.2 SEPARATE MEMORY AND I/O  A path used to transfer information
BUSES: between different devices at different
times is said to be time shared.
Large computer with separate I/O The following figure shows the four
processors frequently employ the dual bus units connected by dedicated links.
system. Here buses are provided for
communication between the IOP and the
set of I/O devices.
4.11.3 LONG DISTANCE COMMUNICATION
 Many computer systems have been

designed in which the components
parts are separated by large distances:
56
 Dedicated links allows very fast transfer
 Interrupt are used for any infrequent or
of information through the system.
exceptional event that causes a CPU to
 All ‘n’ devices may send or receive
temporarily transfer control from its
information simultaneously, and there
current program to another program.
is no delay due to busy connections.
 Interrupts handler services the event.
 Systems with dedicated links are
inherently reliable, since the failure of  Interrupt are the primary mean by
any link affects which I/O device obtained the services.
Communication only between the two  I/O interrupts are external requests to
units connected to that link. CPU to initiate or terminate an I/O
operation.
4.11.5 BUS CONTROL  Interrupts are also produced by
hardware or software error detection
 In most computers, the CPU is the usual
circuits that invoke error handling
bus master, while the memory and I/O
routines within the operating system.
interface circuits are the slave.
 A power supply failure at any instance,
 IOP and certain other I/O controller can
generate an interrupt that request
also serve as the bus master.
execution of an interrupt handler
 Only a master can initiate data transfer.
designed to save critical data about the
 Bus slave can only respond to system’s state.
commands issued by a bus master.  Interrupts generated internally by the
 In synchronous buses each item is CPU are called traps.
transferred during a time slot known in  An operating system will interrupting a
advance to both the source and user program that exceeds its allotted
destination units. This implies that the time.
bus interface circuits of both units are
 The basic method of interrupting the
synchronized.
CPU is by activating a control line with
 Synchronization can be achieved by the generic name INTERRUPT REQUEST
driving both units from a common clock that connects the interrupt source to
source, a method that is feasible over CPU.
short distance.
 An interrupt indicator is stored in a CPU
 An alternative approach widely used in register. CPU register is tested
local bus communication is periodically, usually at the end of every
asynchronous communication, in which instruction cycle.
each being transferred is accomplished
 On recognizing the presence of
by a separate control signal to indicate
interrupt, CPU must execute a specific
its presence to the destination unit.
interrupt servicing program.
 The destination unit may respond with
 A problem is caused by the presence to
another control signal to acknowledge
two or more interrupt requests at the
receipt of the information.
same time.
 As each device can generate these
 Priorities must be assigned to the
control signals at its own rate, data
interrupts and the interrupt with higher
transmission rate can vary with the
priority is selected for service.
inherent speed of the communicating
 When interrupt occurs, the following
devices.
steps are taken by CPU:
 This flexibility in transmission rates is
achieved at the cost of more complex
1) CPU identifies the source of the
bus control circuitry.
interrupt by polling I/O device.
4.11.6 INTERRUPT MECHANISM
57
2) The CPU obtains the memory  Interrupt acknowledgement is used to
address of the required interrupt locate the device which has actually
handler. interrupted to the microprocessor.
This address can be provided by  When microprocessor gets interrupted
interrupting device along with its by device it executes the corresponding
interrupt request. ISR to service that device.
3) The program counter (PC) & other  In single level interrupt system virtually
CPU status information are saved in any number of interrupting I/O devices
memory. can be connected.
4) Program counter (PC) is loaded with  Disadvantage: Selection of I/O device
the address of interrupt handler. which has actually interrupted the
Execution proceeds until a return processor is a time consuming process.
instruction is encountered, which
transfer control back to the 4.11.8 MULTI-LEVEL /MULTI LINE
interrupted program. INTERRUPT
 Interrupt are maskable as well as non-
maskable.
 Maskable interrupts can be enabled or
disabled by instructions in instruction
set.
 When higher priority interrupt is being
serviced, the lower-priority interrupt
gets disabled by CPU.
 Low priority interrupts are served
when no other interrupts with higher  Here limited number of interrupting
priority interrupt are awaiting devices are connected to processor
execution. which is equal to the number of general
interrupt request lines available to the
4.11.7 SINGLE LEVEL AND MULTILEVEL processor.
INTERRUPTS
 For each of that interrupt request line
the processor will have one interrupt
flag bit.
 Each of the interrupt request line there
will be a separate ISR used to service
that I/O device.
 If multiple interrupt request are coming
from I/O devices then processor
internally resolves their priorities and
executes the corresponding ISR. It
 Microprocessor has only one general provides fast response for interrupt
interrupt request line (INT) to which request coming from I/O devices.
multiple interrupting devices are  Drawback: Limited number of I/O
connected. devices can be connected.
 The interrupt is sensed by the
4.11.9 VECTORED INTERRUPTS
microprocessor. It sets internal
interrupt flip-flop to logic 1 and then There are two possible implementations of
gives corresponding acknowledgement the vectored interrupts depending on the
(INT ACK) type of microprocessor.
58
 Here, the interrupt request coming from transferred to execute the ISR and to
I/O devices are stored in interrupt service that I/O device.
register by setting the corresponding
bits.
 The interrupt mask register is user
programmable which is used to
disable/mask corresponding interrupt
requests.
 When multiple interrupts are
forwarded to the input of priority
encoder, it will encode only higher
priority interrupt input using priority
encoder and accordingly code will be
generated.
This code is inserted at predefined
locations in Program Counter (PC).
Vector address given by the I/O device

 The system is implemented in multi
level interrupt where separate
interrupt request line (IRQ) and their
acknowledgements (ACK) are used for
each of the I/O device.
 When multiple interrupt requests come
from multiple I/O devices then priority
control logic will resolve the priority
and will send common interrupt
request to the CPU.
 CPU sends interrupt acknowledge to the
priority control logic, which in turn
gives acknowledgement to the higher
priority interrupting device.
The starting address of ISR is a available
to the processor. It is loaded into
program counter (PC) and control is
59
5 MULTIPLE PROCESSOR ORGANISATION
5.1 FLYNN’S CLASSIFICATION OF

The classification suggested by Flynn, is
based on multiplicity of instruction stream
(IS) and data stream (DS).
5.1.1 SINGLE INSTRUCTION SINGLE
DATA (SISD)
This is normal single processor system
(Unit processor) which contains single 5.1.3 MULTIPLE INSTRUCTION SINGLE
decoded instruction stream (IS) which DATA (MISD)
operates on a single data stream (DS).
 Here control unit (CU) fetches Here multiple instruction streams are
instruction stream from processing fetched form shared memory modules by
element memory (PEM), which is multiple control units which in turn
decoded to generate single decoded generates multiple decoded instructions
instruction stream (IS). This is executed stream (IS). These are operated on single
by the processing element (PE) using data stream (DS) taken from shared memory
single data stream (DS). modules. MISD architecture is not realized
but just a prototype model is designed.
 Once the result data is generated by PE,
it is stored back to the PEM.
Example: of SISD architecture are
INTEL, 8085, 8086, 80386.
5.1.2 SINGLE INSTRUCTION MULTIPLE

DATA (SIMD) 5.1.4 MULTIPLE INSTRUCTION MULTIPLE
This is also referred as Array Processor. DATA (MIMD)
Here the single instruction stream is This is also referred to as multiprocessor.
fetched from shared memory modules or The diagram is as shown.
taken from external front end system. It is
decoded by a single control unit (CU), to
generate decoded instruction stream (IS).
This instruction is broadcasted to multiple
processing elements (PES), which will
operate on different data sets. Hence the
name given (SIMD). The result generated
by processing elements (PEs) is stored
back into memory modules..
60
Here, multiple instruction streams are fetch  The co-ordination among processors
by control units. These instruction stream required to execute the MESI protocol
are decoded to get multiple decoded simultaneously its own bottleneck in
instruction stream (IS), which operates on the shared bus.
multiple data stream (DS) taken from  The result is that typical multiprocessor
shared memory modules. Here, each system are limited to a few tens of
processing element executes one instruction processors.
stream and operates exactly on one data  A shared- memory multiprocessor with
stream. thousands of processors does not
5.2 MULTIPROCESSOR appear to be practical.
In MIMD (shared memory), all of the

processors access, the same pool of main
memory usually via a shared high-speed
bus.
 Communication among the processor is
easy.
 Each processor can leave a message or
data results in a particular location and
then tell another processor the address
at which to find the data.
 This is typical of multiprocessing, as 5.3 PARALLEL PROCESSING APPLLICATIONS
distinct from multiprocessors.
For large data crunching in a short span of
 With multiprocessing, a computer
time conventional uniprocessor system are
system is running one of more
inefficient. Hence in such applications
applications that are broken up into a
parallel architectures & parallel algorithms
number of co-operating sequential
are often used.
processes.
Some of the parallel processing Algorithms
 Such a scheme can be implemented on a
are quoted (mentioned) below.
single-processor system, but it is also
easily implemented on a multiprocessor. 1. Numerical Weather Forecasting
 At any time, each of the multiple Huge raw information is collected
processes is executing a separate process. generally for the period of last 48 hours
 The communication among processes is and based on some predictions as well as
done by massage and flags that can be considering all atmospheric parameters
passed among processors via a main (atmospheric pressure, temperature,
memory. humidity, etc) weather forecast for next 24
 In practice, performance requirements hours is estimated.
complicate the requirement for
communication among processors. 2. Oceanography and Astrophysics
 When we have many fast processors It is used for the prediction of wealth in the
competing for access to the same oceans.
memory across the same bus,
contention can seriously degrade 3. Socio-Economics
overall performance. It is generally used for modeling of
 The solution is to add a local Cache to economy of a nation or even world.
each processor. This solution brings in
the new problem of cache coherency, and 4. Finite Element Analysis
a protocol.
61
It is used in design of large structures like
dams, supersonic jets, ships, etc. In these
designs large amount of partial different
equations are to be solved concurrently,
hence parallel architectures and algorithms
are used.
5. Artificial Intelligence and Automation

i) Image processing
ii) Pattern recognition
iii) Natural Language processing (NLP) This is nothing but MIMD architecture.
iv) Expert systems These are classified into two categories:
i) Loosely coupled multiprocessor
6. Seismic Exploration ii) Closely (tightly) coupled multiprocessor.
Ultrasonic wave is generated underground
and its spectrum is recorded and analyzed 5.5 LOOSELY COUPLED MULTIPROCESSOR
at certain distance using sensors and
parallel architecture. This is useful in the  This system contains a number of
determination of underground strata. computer modules each containing a
processor, local memory and I/O
7. Genetic Engineering devices.
Used for study of complex molecules.  A switch called as CAS is used to contact
the computer module to a message
8. Weapon research and Defense transfer system (MTS) which is
Use for weapon research and defense. generally a time shared bus. It is used
for message transfer between two
9. Medical application computer modules.
It is used for scanning of human brain using  The diagram is as follows:
CAT scanners or even it is used for P : Processor
scanning whole body. LM : Local Memory
IO : I/O devices
10. Remote sensing application CAS : Channel & arbiter Switch
Computer analysis of remotely sensed  Here work load is distributed among
earth resource data has many potential various computer modules.
applications in agriculture, forestry,  One computer module & multiprocessor
geology and water resources. Explosive operating system is responsible for
amounts of pictorial information needs to distribution of the work load.
be processed in this area.  Each of that computer will carry out the
assigned work locally in a distributed/
11. Energy Resource Exploration independent fashion. Hence it is
Energy affects the progress of the entire sometimes called a distributed system.
economy on a global basis. Computers can  There are vary less amount of interaction
play an important role in the discovery of between various processors/ computer
oil and gas and the management of their modules through time – shared bus,
recovery, in the development of workable hence it is called as “Loosely Coupled
plasma fusion energy and in ensuring System.”
nuclear reactor safety.  The performance of loosely coupled
5.4 MULTIPROCESSOR ARCHITECTURE system (LCS) depends on following
factors:
62
1. Bandwidth of time shared bus which 3 In loosely coupled system, the In, tightly coupled
processor do not share memory, system, there is a single
defines the speed of message transfer. and each processor has its own system-wise primary
2. Message length local memory. memory that is shared
by all the processors
3. Message arrival rate, which the
degree of interaction/coupling
between various computer modulus. C
SIMD MIMD
4 In these systems, all physical In these systems,
1 It is also called as Array It is also called as communication between the communication
Processor Multiprocessor processors is done by passing between the processors
2 Here, single stream of Here multiple streams of message across the network that usually takes place
instruction is fetched instructions are fetched interconnects the processors. through the shared
3 In SIMD (single instruction In MIMD(multiple memory.
multiple data) the instruction instruction multiple data 5 In this system, the interaction In this system, the
stream is fetched by shared stream) the instruction between various processors and interaction between
memory. stream is fetched by computer modules is very less. various processors is
control unit. very high.
4 Here instruction is Here instruction streams 6 In this, (As channel and Arbiter In this, shared memory
broadcasted to multiple are decoded to get multiple Switch) is used to connect the modules can
processing elements (PEs), decoded instruction computer module to a message communicate through
which will operate on streams (IS), which operate transfer system. PMIN.
different data sets (i.e. data on multiple data 7 In this, the processor is directly In this, processor are
streams are taken from stream(DS) taken from connected to IO devices. sometimes referred as
shared memory modules). shared memory modules. tightly coupled system.
Hence the name SIMD. 8 Distributed memory Shared-memory
5 Diagram: Diagram: multiprocessors are sometimes multiprocessor are
referred as loosely coupled sometimes referred as
system. tightly coupled system.
9 There are no unmapped local In this tightly coupled
memories in loosely coupled system, every processor
system. has small unmapped
local memories which is
used to store code.
5.6 SERIAL COMMUNICATION
Here, mI=1 and mD >1 Devices such as the keyboard and mouse
Example: Here, m1 > 1 and mD > 1. are connected directly to the computer
ILLIAC IV have a single This covers
program-control unit & many multiprocessors, which are with they are used, typically through a
independent execution unit. computers with more than serial communication link.
mI : minimum number of one CPU & the ability to
instruction execute several programs
mD : minimum number of data simultaneously. 5.7 ASYNCHRONOUS TRANSMISSION
streams
 The simplest scheme for serial
Loosely Coupled Multiprocessor Tightly Coupled
Multiprocessor communication is asynchronous
In loosely coupled multiprocessor In tightly coupled transmission using a technique called
organization communication multiprocessor organization
between computer modules has communication between start-stop. Data are organized in small
taken place through message processor`s can be taken groups of 6 to 8 bits with will defined
Transfer system place through PMIN.
Diagram beginning and end for timing recovery.
 In a typical arrangement, alphanumeric
characters encoded in 8-bits are
transmitted as shown in figure.
63
→ The line connecting the transmitter and other way, it may be used as a parity bit,
receiver is in the 1 state when idle. to aid in detecting transmission errors.
→ Transmission of a character is Parity bit = 1  transmitted data
preceded by a 0 bit, referred to as the contains an odd number
start bit followed by eight bits and one Parity bit = 0  otherwise
or two stop bits.  When a parity bit is used, it is set by the
→ The stop bits have a logic value 1. The transmitter such that the parity of the 8
start bit alerts the receivers that data bits transmitted is always the same i.e.
transmission is about to begin. either odd or even.
→ The leading edge is used to synchronize If a transmission error causes the value
the receiver’s clock with that of the of one bit to charge, the receiver will
transfer. detect an incorrect parity and hence
The stop bits at the end delineate will be able to determine that an error
consecutive characters in the case of has occurred.
continuous transmission.  Disadvantage
→ Inserting and removing the start and In the start-stop method explained
stop bits is the responsibility of above, the position of the 1 to 0
transmission and reception circuitry. transition at the beginning of the start
bit as shown in figure above and is the
 For proper synchronization at the important key to obtain a correct
receiving end, the receiver clock is timing.
derived from a local clock whose Therefore, this scheme is useful only
frequency is relatively higher than the where the speed of transmission is
transmission rate. (typically 16 times sufficiently low and the conditions on
higher) the transmission link are such that
 This clock is used to increment a square waveform shown in the figure
modulo-16 counter, which is reset to 0 maintains its shape. For higher speed
when the leading edge of a start-bit is and longer lines, much signal degradation
detected. When the counters count up takes place.
to 8, it indicates that the middle of the
start bit has been reached. 5.8 SYNCHRONOUS TRANSMISSION
The value of the start bit is sampled to In synchronous transmission, data are
confirm that it is a valid start bit, and transmitted in blocks consisting of several
the counter is again reset to 0. hundreds or thousands of bits each. The
Therefore, whenever count reaches 16, start and end of each block are marked by
the incoming data signal is sampled, appropriate codes and data within a block
which should be close to the middle of are organized according to an agreed upon
each bit transmitted. set of rules. For complete transmission and
 Therefore, as long as the relative detection of carrier frequencies and
positioning of bits within transmitted establishment of synchronization, modems
characters is not in error by more than require a significant start-up time.
one half of a clock cycle, the receiver
currently interprets the bits of the 5.9 SOLVED EXAMPLES
encoded characters.
 While transmitting characters, they are 1. An asynchronous serial communication
represented by the 7-bit ASCII code controller that uses a start-stop scheme
occupying bits 0 through 6 in figure for controlling the serial I/O of a system
above. The MSB i.e. but 7 of the is programmed for a string of length 7
transmitted byte is usually set to 0. In bit, one parity bit (odd parity) and one
64
stop bit. The transmission rate is 1000 Page size is 4K(212) word. 12 bits are
bits/second. required to address a word in the page
i) What is the complete bit stream that frame.
is transmitted for the string 7 bits 5 bits 12 bits
‘0110101’? Logical address: Segment Page Word
ii) How many strings can be
transmitted per second? Physical address Block Word
Solution 12 bits 12 bits
i) Complete bit stream that is The logical address is partitioned into 3
transmitted for the fields. The segment field specifies a
string ‘0110101’ is 1011010101 as segment number. the page field
per requirement. specifies a page within the segment and
ii) The number of strings can be the word field gives a specific word
transmitted = 100. within the page. A page field of k bits
can specify upto 2kpages. A segment
2. Consider a CRT display that has a next number may be associated with just one
mode display format of 75  30 page or as many as 2k pages. Thus the
characters with a 9  12 characters cell. length of segment can vary according to
What is the video buffer RAM for the the no. of pages assigned to it.
display to be used in monochrome (1-
bit per pixel) graphics mode?
Solution
Number of bits required
= No. of characters  cell size
= 75  30  9  12
= 243000 bytes
= size of RAM
3. The logical address space in a computer The mapping of a logical address into a
system consists of 128 segments of physical address is as shown in the
capacity 32 pages of 4 K words. The figure. The segment no. of the logical
physical memory consists of 4K page address specifies the address for the
frames each of 4K words capacity. segment table. The entry in the segment
i) Formulate the logical and physical table is a pointer address for a page
address table base. Page table base is added to
ii) Give the block diagram for table the page number given in logical
translation address. The sum produces a pointer
Solution address to an entry in the page table.
Formation of logical address: The value found in the page table
128 = 27 =7 bits are required to address provides the block number in the
128 segments physical memory.
32 = 25 =5 bits are required to address
32 pages within each segment.
Page size is 4K (212) word. 12 bits are
required to address a word in the page.
Formation of physical address:
12 bits are required to address 4K page
frame.
65
GATE QUESTIONS
Topics Page No
1. CACHE AND MAIN MEMORY 67
2. INSTRUCTIONS-PIPELINING & ADDRESSING MODES 76
3. CPU CONTROL DESIGN & INTERFACES 91
4. SECONDARY MEMORY & DMA 96
66
1 CACHE AND MAIN MEMORY
Q.1 A graphics card has on board Recently Used (LRU) scheme. The
memory of 1 Mbyte. Which of the number of cache misses for the
following modes can the card not following sequence of block
support? addresses is 8, 12, 0, 12, 8
a) 1600 x 400 resolution with 256 a) 2 b) 3
colors on a 17 inch monitor c) 4 d) 5
b) 1600 x 400 resolution with 16 [GATE-2006]
million colors on a 14 inch
monitor Statements for Linked Questions no 6 & 7
c) 800 x 400 resolution with 16 A CPU has a 32 Kbyte direct mapped cache
million colors on a 17 inch with 128-Byte block size. Suppose
monitor A is a two-dimensional array of size 512x
d) 800 x 800 resolution with 256 512 with elements that occupy 8-byte each.
colors on a 14 inch monitor Consider the following two C code
[GATE-2000] segments, P1 and P2
P1 :
Q.2 Which of the following requires a for (i = 0; i < 512 ; i + +) {
device driver? for (j = 0; j < 512; j + +) {
a) Register b) Cache x + = A [i][j] ;
c) Main memory d) Disk }
[GATE-2002] }
P2 :
Q.3 More than one words are put in one for (i = 0; i < 512 ; i + +) {
cache block to for (j = 0; j < 512; j + +) {
a) exploit the temporal locality of x + = A [j][i] ;
reference in a program }
b) exploit the spatial locality of }
reference in a program P1 and P2 are executed independently with
c) reduce the miss penalty the same initial state, namely, the Array A is
d) None of the above not in the cache and i , j , x are in registers.
[GATE-2002] Let the number of cache misses
experienced by P1 be M1 and that for P2 be
Q.4 Increasing the RAM of a computer M2.
typically improves performance Q.6 The valueM1 is
because a) Zero b) 2048
a) virtual memory increases c) 16384 d) 262144
b) larger RAM are faster [GATE-2006]
c) fewer page faults occur
d) fewer segmentation faults occur M1
Q.7 The value of the ratio is
[GATE-2005] M2
1
Q.5 Consider a small two-way set- a) Zero b)
16
associative cache memory, consisting
1
of four blocks. For choosing the c) d) 16
block to be replaced, use the Least 8
[GATE-2006]
67
Common Data for Questions no 8 and 9 Q.12 Consider a 4-way set associative
Consider two cache organizations : cache consisting of 128 lines with a
The first one is 32 Kbyte 2-way set line size of 64 words. The CPU
associative with 32 Kbyte block size. The generates a 20-bit address of a word
second one is of the same size but direct in main memory. The number of bits
mapped. The size of an address is 32 bit in in the TAG, LINE and WORD fields
both cases. A 2-to-1 multiplexer has a are respectively.
latency of 0.6 ns while a k-bit comparator a) 9, 6, 5 b) 7, 7, 6
has a latency of k/10 ns. The hit latency of c) 7, 5, 8 d) 9, 5, 6
the set associative organization is h1 while [GATE-2007]
that of the direct mapped one is h2.
Q.13 In an instruction execution pipeline,
Q.8 The value of h1 is the earliest that the data TLB
a) 2.4 ns b) 2.3 ns (Translation Look a side Buffer) can
c) 1.8 ns d) 1.7 ns be accessed is
[GATE-2006] a) Before effective address
Q.9 The value of h2 is calculation has started
a) 2.4 ns b) 2.3 ns b) During effective address calculation
c) 1.8 ns d) 1.7 ns c) After effective address calculation
[GATE-2006] has completed
d) After data cache lookup has
completed
Statements for Linked Questions no 10
and 11 Common Data for Questions no 14, 15
Consider a machine with a byte and 16
addressable main memory of 216 byte. Consider a machine with a 2-way set
Assume that a direct mapped data cache associative data cache of size 64 Kbyte and
consisting of 32 lines of 64 byte each is block size 16 byte. The cache is managed
used in the system. A50 X 50 two- using 32 bit virtual addresses and the page
dimensional array of bytes is stored in the size is 4 Kbyte. A program to be run on this
main memory starting from memory machine begins as follows.
location 1100H. Assume that the data cache double ARR [1024] [1024]
is initially empty. The complete array is int i , j ;
accessed twice. Assume that the contents of / Initialize array ARR to 0.0 /
the data cache do not change in between for (i =0; i < 1024 ; i + +)
the two accesses. for (j =0; j < 1024 ; j + +)
ARR [i] [j] = 0.0;
Q.10 How many data cache misses will The size of double is 8 Byte. Array ARR is
occur in total? located in memory starting at the
a) 48 b) 50 beginning of virtual page 0 x FF000 and
c) 56 d) 59 stored in row major order. The cache is
[GATE-2007] initially empty and no pre-fetching is done.
The only data memory references made by
Q.11 Which of the following lines of the the program are those to array ARR.
data cache will be replaced by new
blocks in accessing the array for the Q.14 The total size of the tags in the cache
second time? directory is
a) line 4 to line 11 b) line 4 to line 12 a) 32 Kbit b) 34 Kbit
c) line 0 to line 7 d) line 0 to line 8 c) 64 Kbit d) 68 Kbit
[GATE-2007] [GATE-2008]
68
Q.15 Which of the following array request for memory blocks is in the
elements has the same cache index following order:
as ARR [0][0]? 0, 255, 1, 4, 3, 8, 133, 159, 216, 129,
a) ARR [0][4] b) ARR [4] [0] 63, 8, 48, 32, 73, 92, 155
c) ARR [0] [5] d) ARR [5] [0] Which one of the following memory
[GATE-2008] blocks will not be in cache if LRU
replacement policy is used?
Q.16 The cache hit ratio for this a) 3 b) 8
initialization loop is c) 129 d) 216
a) 0% b) 25% [GATE-2010]
c) 50% d) 75%
[GATE-2008] Statements for Linked Questions no 21
and 22
Q.17 For inclusion to hold between two The computer system has an l1 l2 cache, an
cache levels L1 and L2 in a multi- l2 cache and a main memory unit connected
level cache hierarchy, which of the as shown below. The block size in l1 cache
following are necessary? is 4 words. The block size in l2 cache is 16
1. L1 must be a write-through cache. words. The memory access times are 2 ns,
2. L2 must be a write-through cache. 20 ns and 200 ns, for l1 cache, l2 cache and
3. The associativity of L2 must be main memory unit respectively.
greater than that of L1.
4. The L2 Cache must be at least as
large as the L1 cache.
a) 4 b) 1 and 4
Q.21 When there is a miss in L1 cache and
c) 1, 2 and 4 d) 1, 2, 3 and 4
a hit in L2 Cache, a block is
[GATE-2008]
transferred from L2 cache to L1
Q.18 How many 32 k × 1 RAM chips are cache. What is the time taken for
needed to provide a memory this transfer?
capacity of 256 Kbyte? a) 2 ns b) 20 ns
a) 8 b) 32 c) 22 ns d) 88 ns
c) 64 d) 128 [GATE-2010]
[GATE-2009]
Q.22 When there is a miss in both L1
Q.19 A main memory unit with a capacity cache and L2 cache, first a block is
of 4 megabyte is built using 1 M × 1 transferred from main memory to L2
bit DRAM chips. Each DRAM chip Cache, and then a block is
has 1 k rows of cells with 1 k cells in transferred from L2 Cache to L1
each row. The time taken for a single cache what is the total time taken
refresh operation is 100 ns. The for these transfer?
time required to perform one a) 222 ns b) 880 ns
refresh operation on all the cells in c) 902 ns d) 968 ns
the memory unit is [GATE-2010]
a) 100 ns b) 100 ∗ 210 ns Q.23 An 8 Kbyte direct mapped write-
∗ 20
c) 100 2 ns d) 3200 ∗ 220 ns back cache is organized as multiple
[GATE-2010] blocks, each of size 32 byte. The
Q.20 Consider a 4-way set associative processor generates 32-bit addresses.
cache (initially empty) with total 16 The cache controller maintains the
cache blocks. The main memory tag information for each cache block
consists of 256 blacks and the comprising of the following :
69
1 Valid bit c) 1/A d) k/n
1 Modified bit [GATE-2014-1]
As many bits as the minimum
needed to identify the memory Q.27 A 4-way set-associative cache
block mapped in the cache. What is memory unit with a capacity of 16
the total size of memory needed at KB is built using a block size of 8
the cache controller to store words. The word length is 32 bits.
metadata (tags) for the cache? The size of the physical address
a) 4864 bit b) 6144 bit space is 4 GB. The number of bits for
c) 6656 bit d) 5376 bit the TAG field is _____
[GATE-2011] [GATE-2014-2]
Q.24 In a k-way set associative cache, the Q.28 In designing a computer’s cache
cache is divided into 𝜐 sets, each of system, the cache block (or cache
which consists of k lines. The lines of line) size is an important Parameter.
a set are placed in sequence one Which one of the following
after another. The lines in set s are statements is correct in this context?
sequenced before the lines in set a) A smaller block size implies
(𝑠 + 1). The main memory blocks better spatial locality
are number 0 onwards. The main b) A smaller block size implies a
memory block numbered j must be smaller cache tag and hence
mapped to any one of the cache lower cache tag overhead
lines from c) A smaller block size implies a
a) ( j mod v)* k to (j mod v)* k + (k -1) larger cache tag and hence lower
b) ( j mod v) to ( j mod v ) + (k - 1) cache hit time
c) ( j mod k) to ( j mod k) + (v -1) d) A smaller block size incurs a
d) (j mod k) * v to ( j mod k) * v+ (v -1) lower cache miss penalty
[GATE-2013] [GATE-2014-2]
Q.25 A RAM chip has a capacity of 1024 Q.29 If the associativity of a processor
words of 8 bits each (1K×8). The cache is doubled while keeping the
number of 2×4 decoders with capacity and block size unchanged,
enable line needed to construct a which one of the following is
16K ×16 RAM from 1K×8 RAM is guaranteed to be NOT affected?
a) 4 b) 5 a) Width of tag comparator
c) 6 d) 7 b) Width of set index decoder
[GATE-2013] c) Width of way selection
Q.26 An access sequence of cache block multiplexor
addresses is of length N and d) Width of processor to main
contains n unique block addresses. memory data bus
The number of unique block [GATE-2014-2]
addresses between two consecutive
accesses to the same block address Q.30 The memory access time is 1
is bounded above K. What is the nanosecond for a read operation
miss ratio if the access sequence is with a hit in cache, 5 nanoseconds
passed through a cache of for a read operation with a miss in
associativity A≥ k exercising least- cache, 2 nanoseconds for a write
recently-used replacement policy? operation with a hit in cache and 10
a) n/N b) 1/N nanoseconds for a write operation
with a miss in cache. Execution of a
70
sequence of instructions involves Q.33 A processor can support a maximum
100 instruction fetch operations, 60 memory of 4 GB, where the memory
memory operand read operations is word addressable (a word
and 40 memory operand write consists of two bytes). The size of
operations. The cache hit-ratio is the address bus of the processor is
0.9. The average memory access at least _________bits.
time (in nanoseconds) in executing [GATE-2016-1]
the sequence of instructions
is__________. Q.34 The width of the physical address on
[GATE-2014-3] a machine is 40 bits. The width of
the tag field in a 512 KB 8-way set
Q.31 Assume that for a certain processor, associative cache is _________ bits.
a read request takes 50 [GATE-2016-2]
nanoseconds on a cache miss and 5
Q.35 A file system uses an in-memory
nanoseconds on a cache hit. Suppose
cache to cache disk blocks. The miss
while running a program, it was
rate of the cache is shown in the
observed that 80% of the processors
figure. The latency to read a block
read requests result in a cache hit.
from the cache is 1ms and to read a
The average and access time in
block from the disk is 10ms. Assume
nanoseconds is _______.
that the cost of checking whether a
[GATE-2015-2]
block exists in the cache is
Q.32 Consider a machine with byte negligible. Available cache sizes are
addressable main memory of 2020 in multiples of 10 MB.
bytes, block size of 16 bytes and a
direct mapped cache having 212
cache lines. Let the address of two
consecutive bytes in main memory
be (E201F)16 and (E2020)16 . What
are the tag and cache line address
(in hex) for main memory address
(E201F)16?
a) E, 201 b) F, 201 The smallest cache size required to
c) E, E20 d) 2, 01F ensure an average read latency of
[GATE-2015-3] less than 6 ms is _________ MB.
[GATE-2016-2]
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(b) (b) (a) (c) (c) (c) (b) (a) (b) (c) (a) (b) (b) (b)
15 16 17 18 19 20 21 22 23 24 25 26 27 28
(b) (b) (a) (c) (d) (d) (c) (a) (d) (b) (b) (a) 20 (d)
29 30 31 32 33 34 35
(d) 1.68 14 (a) 31 24 30
71
EXPLANATIONS
Q.1 (b) Thus, lesser number of page faults
A graphics card with on board occur. Now if any replacement rule
memory of 1 Mbyte cannot support is applied which doesn't cause blade
a mode of 1600×400 resolution with anomaly always results in reduction
16 million colors on a 14 inch of page faults.
monitor.
Q.5 (c)
Q.2 (b) The page frames content after
In computing, a device driver or applying LRU for the sequence 8, 12,
software driver is a computer 0, 12, 8 is
program allowing higher-level
computer programs to interact with
a hardware device.
A driver typically communicates
with the device through the
computer bus or communications Therefore, total number misses
subsystem to which the hardware =4
connects. When a calling program
invokes a routine in the driver, the Q.6 (c)
driver issues commands to the 16 array elements are brought into
device. Once the device sends data the cache as the first element
back to the driver, the driver may A[0][0] is accessed and there will be
invoke routines in the original calling hits for the next 15 accesses for A[0]
program. Drivers are hardware [0] to A[0] [15] which are in cache
dependent and operating system and a miss at A[0] [16], Therefore,
specific. They usually provide the there occurs 15 hits and one miss,
interrupt handling required for any for every 512 × 512/16 = 16384
necessary asynchronous time- block transfer during P1.
dependent hardware interface.
Q.7 (b)
Q.3 (a) As the next element required to be
It is done to exploit the temporal accessed after A[0][0] is A[1][0],
locality of reference in a program as then the elements A[0][1] to A[0]
cache is the fastest memory available [15] brought into cache are of no
temporarily which is the mirror use.
image of main memory and it stores Thus, there will be 262144 (512 ×
more than a word at one time. 512) misses and no hits
Therefore,
Q.4 (c) M1/M2 = 16384/262144 = 1/16.
We know that, size is directly
proportional to page frame. So, the Q.8 (a)
RAM is increased. The main memory Consider the following table, as it is
also increases and thus the page given the following is concluded :
frame size also increases which
results in reduction of snapping.
72
Therefor h1 = 18/10 + 0.6 = 2.4 ns kbyte and 1 row contains 1024
elements, i.e., 210 locations.
Q.9 (b)
Consider the following table, as it is Q.16 (b)
given the following is concluded : We know that hit ratio is given as
Number of hits/(Number of hits +
Number of rows)
= 4/16 = 25%
Therefor, h2 = 17/10 + 0.6
= 2.3 ns Q.17 (a)
For a multilevel cache hierarchy the
Q. 10 (c) condition that is necessary to hold
The number of data cache misses inclusion is that the L2 cache must
that occur on total are 56 as is clear be at least as large as the L2 cache.
from the given data. And since both the levels are write
through cache this is not the
Q.11 (a) sufficient condition as is depicted by
The lines that will be replaced by the following figure:
the new blocks are lines 4 to 11.
Q.12 (b)
7 bits are required if there are 128
lines. The reason behind is that 128
is 2 ^ 7.
Now, each line is of 64 words or 2 ^
6 words. Hence, number of bits
required is 6 bits 64 or 2 . As per the
given, a 20 bit address is generated Q.18 (c)
for a word in main memory, so bits As given, basic RAM is 32 k x 1 and
required for tag = 20 – (7 + 6) = 20 – we have to design a RAM of 256 kx 8.
13 = 7 bit. Therefore, number of chips
required= 256 k × 8/(32k × 1)
Q.13 (b) = 245×1024× 8/32 × 1024 × 1)
During effective address calculation, (Multiplying and dividing by 1024)
the translation look aside buffer = 64 = 8x8
data can be accessed earliest. Means, 64 = 8 parallel lines x 8 serial
RAM chips
Q.14 (b)
Q.19 (d)
Since, the capacity is 4 MB therefore
4*106* 8 = 32* 106 ...A
From the above figure and given
And 1 k* 1 k (rows* cells) = 220
conditions the total number of tags
...B
comes out to be = 17 x 2 x 1024 = 34
Therefore, the time required to
bit
perform one refresh operation on all
the cells in the memory unit is A*B
Q.15 (b)
=32*106*220
The array element ARR[4] [0] has
the same cache index as ARR[0] [0]
Q.20 (d)
since it is given that page size is of 4
73
48 0 3 6 9 12
Set 0 4 32 8 1 4 7 10 13
26 92
2 5 8 11 14
1
The cache is divided into v = 3 sets
133 and each of which consists of k = 5
Set 1 73 lines
129
Suppose we need to calculate for
Set 2 main memory block j = 9 then,
255 155 a) (j mod v)* k to (j mod v)* k+(k- 1)
3 =(9 mod 3) *5 to (9 mod 3) * 5+(5- 1)
Set 3 159 = 0 to 4
b) j (mod v) to j (mod v) + (k - 1)
63
= 11 (mod 3) to 11(mod 3) + 4
0 mod 4 = 0 (set 0) = 2 to 6
255 mod 4 = 3 (set 3) c) j (mod k) to j (mod k) + (v - 1)
Like this the sets in the above table = 11 (mod 5) to 11 (mod 5) + 2
are determined for the other values. = 6 to 8
(a) is correct. gives correct answer.
Q.21 (c)
As already given in question, Q.25 (b)
Memory access time for 11 = 2 ns RAM chip size =1k × 8[1024 words
Memory access time for 12 = 20 ns of 8 bits each]
Now the required time pf transfer ... RAM to construct =16k ×16
= 20 + 2 = 22 ns 16k 16
Number of chips required =
lk  8
Q.22 (a) =16 × 2 [16 chips vertically with
As given memory access time for, each having 2 chips horizontally]
main memory = 200 ns So to select one chip out of 16
Memory access time for L2 = 2 ns vertical chips, we need 4 x 16
Memory access time for L2 = 20 ns decoder.
Total access time = Block transfer Available decoder is – 2 × 4 decoder
time from main memory to t2 cache To be constructed is 4 × 16 decoder
+ Access time of L2 + Access time of L1
Now, the required time of transfer
= 200 + 20 + 2 = 222 ns
Q.23 (d)
Number of clocks in cache
8  210  8 210
  2  28
32  8 2
8 bits are needed to identify each
block.
Size of block is 32 bytes Thus 5 bit
wi 11 be needed to identifies the
lines.
So we need 5, 2 × 4 decoder in total
Q.24 (b) to construct 4 × 16 decoder.
Considering the following such
cases that the cache blocks are Q.26 (a)
arranged as follows.
74
Q.27 (20) Memory operand Read operations =
Physical address size = 32 bits 90%(60)*1ns +10%(60)×5ns
Cache size =16k bytes =214 Bytes = 54ns + 30ns = 84ms
block size = 8 words 8×4 Byte= 32 Memory operands write operation
Bytes time = 90%(40)*2ns
(where each word= b Bytes) +10%(40)*10ns
214 = 72ns + 40ns =112ns
No. of blocks = 5  29 Total time taken for executing 200
2
block offset =9bits instructions =140 + 84 +112 =
336ns
∴ Average memory access time
336ns
  1.68ns
200
29
No. of sets= =27
4 Q.31 (14)
set ofset =7 bits Average read access time = [(0.8)
Byte offset =8× 4 Bytes =32 Byte= 25 (5) +(0.2 )(50)] ns.
=5 bits = 4 + 10 = 14ns
TAG =32 – (7+ 5) = 20 bits
Q.32 (a)
Q.28 (d)
When a cache block size is smaller, it
TAG cache word TAG cache word
could accommodate more number
block offset Block offset
of blocks, it improves the hit ratio
for cache, so the miss penalty for
Q.33 (31)
cache will be lowered.
Memory size=4GB=232bytes
Word size=2 bytes
Q.29 (d)
Memorysize
When associativity is doubled, then ∴ No.of Address bits =
the set offset will be effected, Word size
32
accordingly, the number of bits used 2 bytes
  231  31bits
for TAG comparator be 2bytes
effected.Width of set index decoder
also will be effected when set offset Q.34 (24)
is changed. Width of wag selection
multiplexer wil be effected when the
block offset is changed. With of
processor to main memory data bus Tag bits = 40 − (19 − 3) = 24 bits
is guaranteed to be NOT affected.
Q.35 (30)
Q.30 (1.68)
Total instruction = 10 instruction +
fetch operation + 60 memory
operand read operation
+40memory operand write op
= 200 instructions (operations)
Time taken for fetching 100
instructions (equivalent to read)
= 90*1ns +10*5ns =140ns
75
2 INSTRUCTIONS-PIPELINING & ADDRESSING MODES
Q.1 The most appropriate matching for 1. Absolute addressing

the following pairs 2. Relative addressing
X: Indirect addressing 3. Based addressing
Y: Immediate addressing 4. Indirect addressing
Z: Autodecrement addressing a) 1 and 4 b) 1 and 2
1: Loops c) 2 and 3 d) 1, 2 and 4
2:Pointers [GATE-2004]
3:Constants
a) X-3 , Y-2 , Z-1 b) X-1, Y-3 , Z-2 Q.6 For a pipelined CPU with a single
c) X-2 , Y-3 , Z-1 d) X-3, Y-1 , Z-2 ALU, consider the following situations
[GATE-2002] 1. The j+1st instruction uses the
result of the jth instruction as an
Q.2 Comparing the time T1 taken for a operand.
single instruction on a pipelined 2. The execution of a conditional
CPU with Timer T2 taken on a non- jump instruction.
pipelined but identical CPU, we can 3. The j th and j+1 th instructions
say that require the ALU at the same
a) T1  T2 time.
b) T1  T2 Which of the above can cause a
c) T1  T2 hazard?
a) 1 and 2 b) 2 and 3
d) T1 is T2 plus the time taken for
c) 3 only d) All
one instruction fetch cycle
[GATE-2004]
[GATE-2002]
Q.3 Which of the following is not a form Q.7 The performance of a pipelined
of memory? processor suffers if
a) Instruction cache a) the pipeline stages have different
b) Instruction register delays
c) Instruction opcode b) consecutive instructions are
d) Translation look-a-side buffer dependent on each other
[GATE-2002] c) the pipeline stages share
hardware resources
Q.4 In 2's complement addition, d) All of the above
overflow [GATE-2006]
a) is flagged whenever there is
carry from sign bit addition Common Data for Questions no 8 and 9
b) cannot occur when a positive Consider the following assembly language
value is added to a negative value program for a hypothetical processor. A, B
c) is flagged when the carries from and C are 8 registers. The meanings of
sign bit and previous bit match various instructions are shown as comments.
d) None of the above MOV , B # 0; B← 0
[GATE-2002] MOV , C # B; C←B
Q.5 Which of the following addressing Z: CMP C # 0; compare C with 0
modes are suitable for program JZX; jump of X if zero flat is set
relocation At run time? SUB C # 1; C←C – 1
76
RRC A # 1; right rotate A through carry by
one bit. Thus : if the initial values of A and Instruction Operation Instruction
the carry flag are A7.....A0 and C0 size
respectively , their values after the (in words)
execution of this Instruction will be C0 MOV R1, 5000; R1← Memory[5000] 2
A7.....A1 & A0 respectively. MOV R2 , R1; R2← Memory [R1] 1
ADD R2 , R3; R2←R2 +R3 1
JCY; jump to Y if carry flag is set
MOV 6000, R2; Memory[6000]←R2 2
JMPZ; jump to Z HALT; Machine halts 1
Y: ADD B # 1; B←B + 1 [GATE-2006]
JMP Z ; jump to Z
X: Q.11 Consider that the memory is byte
addressable with size 32 bit, and the
Q.8 If the initial value of register A is A0, program has been loaded starting
the value of register B after the from memory location 1000
program execution will be (decimal) . If an interrupt occurs
a) the number of 0 bit in A0 while the CPU has been halted after
b) the number of 1 bit in A0 executing the HALT Instruction, the
c) A0 return address (in decimal) saved in
d) B the stack will be
[GATE-2006] a) 1007 b) 1020
c) 1024 d) 1028
Q.9 Which of the following instructions [GATE-2006]
when inserted at location X will
ensure that the Value of register A Q.12 Let the clock cycles required for
after program execution is the same various Operations be as follows :
as its initial value? Register to/from memory transfer
a) RRC A # 1 : 3 clock cycles
b) NOP ; no operation Add with both operands in register :
c) LRC A # 1 ; left rotate A through 1 clock cycle
carry flag by one bit Instruction fetch and decode
d) ADD A # 1 : 2 clock cycles per word
[GATE-2006] The total number of clock cycles
required to execute the program is
Q.10 A 4-state pipeline has the stage a) 29 b) 24
delays as 150, 120, 160 and 140 ns c) 23 d) 20
respectively. Registers that are used [GATE-2006]
between the stages have a delay of 5 Statements for Linked Questions
ns each. Assuming constant clocking no 13 and 14
rate, the total time taken to process Consider the following data path of
1000 data items on this pipeline will a CPU:
be
a) 120.4 μs b)160.5 μs
c) 165.5 μs d) 590.0 μs
[GATE-2006]
Common Data for Questions no 11 and 12

Consider the following program segment
for a hypothetical CPU having three user The ALU, the bus and all the registers
registers R1, R2 and R3. in the data path are of identical size.
77
All operations including L2 : A R 0 ,R 0 ,R 0 <= R 0  R 0
incrementation of the PC and the L3 : S R 2 , R 0 , R 2 <= R 2  R 0
GPRs are to be carried out in the ALU.
Two clock cycles are needed for Let each state takes on clock cycle.
memory read operation, the first What is the number of clock cycles
taken to complete the above sequence
one for loading address in the MAR
of instructions starting from the
and the next one for loading data
fetch of l1?
from the memory bus into the MDR.
a) 8 b) 10
[GATE-2006]
c) 12 d) 15
[GATE-2006]
Q.13 The instruction "ADD Ro, R1" has the
register transfer interpretation R0
Q.16 Consider a three word machine
<=R0+R1. The minimum number of
instruction :
clock cycles needed for execution
ADD A[R0] , @ B
cycle of this instruction is
The first operand (destination) "A
a) 2 b) 3
[R0]" uses indexed addressing mode
c) 4 d) 5
with Ro as the index register. The
[GATE-2006]
second operand (source) “@B” uses
indirect addressing mode. A and B
Q.14 The instruction "CALL 2n , sub" is a
are memory addresses residing at the
two word instruction. Assuming that
second and the third words,
PC is incremented during the fetch
respectively. The first word of the
cycle of the first word of the
instruction specifies the opcode , the
instruction, its register transfer
index register designation and the
interpretation is
source and destination addressing
Rn <= PC + 1
modes. During execution of ADD
PC <=M [PC]
instruction, the two operands are
The minimum number of CPU clock
added and stored in the destination
cycles needed during the execution
(first operand).
cycle of this instruction is
The number of memory cycles
a) 2 b) 3
needed during the execution cycle of
c) 4 d) 5
the instruction is
[GATE-2006]
a) 3 b) 4
c) 5 d) 6
Q.15 A 5 stage pipelined CPU has the
[GATE-2006]
following sequence of stages.
IF: Instruction fetch from
Q.17 Match each of the high level
instruction memory
language statements given on the
RD: Instruction decode and register
read left hand side with the most natural
EX: Execute : ALU operation for data addressing mode from those listed
and address computation on the right hand side.
List I List II
MA : Data memory access : for write
P. A[l] = B[J] 1. Indirect addressing
access, the register read at RD stage
is used Q. while ( A + +) ;
* 2. Indexed addressing
WB : Register write back R. int temp = x ;* 3. Auto increment
Consider the following sequence of a) P-3, Q.2 , R-1 b) P-1, Q.2 , R-3
instructions: c) P-2 , Q.4 , R-1 d) P-1, Q.2 , R-3
L1 : L R0 , loc 1 ; R0 <=M [loc1] [GATE-2006]
78
Q.18 A CPU has a five-stage pipeline and Common Data for Questions no 20, 21
runs at 1 GHz frequency. Instruction and 22
fetch happens in the first stage of Consider the following program segments
the pipeline. A conditional branch Here R1, R2 and R3 are the general purpose
instruction computes the target registers.
address and evaluates the condition Instruction Operation
Instruction size
in the third stage of the pipeline. The (Number of
words)
processor stops fetching new
MOV R1 3000 R1←M[3000] 2
instructions following a conditional LOOP: MOV R2, R3 R2←M [R3] 1
branch until the branch outcome is ADD R2, R1 R2←R1 + R2 1
known. Program executes 109 MOV R3, R2M[R3] ←R2 1
instructions out of which 20% are INC R3, R3 ←R3 + 1 1
conditional Branches. If each DEC R1, R1 ←R1-1 1
instruction takes one cycle to BNZ LOOP Branch on not zero 2
HALT Stop 1
complete on average, the total
Assume that the content of memory
execution time of the program is
location 3000 is 10 and the content of the
a) 1.0 s b) 1.2 s
register R3 is 2000. The content of each of
c) 1.4 s d) 1.6 s
the memory locations from 2000 to 2010 is
[GATE-2006]
100. The program is loaded from the
Q.19 Consider a new instruction named memory location 1000. All the numbers are
branch-on-bit-set (mnemonic bbs). in decimal.
The instruction "bbs reg , pos, label"
Q.20 Assume that the memory is word
jumps to label if bit in position pos
addressable. The number of memory
of register operand reg is one. A
references for accessing the data in
register is 32 Bit wide and the bits
executing the program completely is
are numbered 0 to 32, bit in position
a) 10 b) 11
0 being the least significant.
c) 20 d) 21
Consider the following emulation of
[GATE-2007]
this instruction on a processor that
does not have bbs implemented. Q.21 Assume that the memory is word
tem← reg and mask Branch to lable addressable. After the execution of
if temp is non-zero The variable this program, the content of memory
temp is a temporary register. For location 2010 is
correct emulation, the variable mask a) 100 b) 101
must he generated by c) 102 d) 110
a) mask← 0 x 1 << pos [GATE-2007]
b) mask ← 0 x ffffffff >> pos
c) mask ← pos Q.22 Assume that the memory is byte
d) mask ← 0 x f addressable and the word size is 32
[GATE-2006] bit. If an interrupt occurs during the
execution of the instruction INC R3 ,
Q.20 CPU has 24-bit instructions. A what return address will be pushed
program starts at address 300 (in on to the stack?
decimal). Which one of the following a) 1005 b) 1020
is a legal program counter (all values c) 1024 d) 1040
in decimal)? [GATE-2007]
a) 400 b) 5OO Q.23 Consider a pipelined processor with
c) 600 d) 700 the following four stages
[GATE-2007]
79
IF : Instruction Fetch L2 : SUB R4 R5 R6
ID: Instruction Decode and Operand L3 : ADD R1 R2 + R3
Fetch
L4 : STORE Memory [R4] R1
EX : Execute
WB: Write Back an BRANCH to Label if R1==0
The IF, ID and WB stages take 1 Which of the instructions l1 , l2 , l3 , or
clock cycle each to complete the l4 can legitimately occupy the delay
operation. The number of clock slot without any other program
cycles for the EX stage depends on modification?
the instructions. The ADD and SUB a) l1 b) l2
instructions need 1 clock cycle and c) l3 d) l4
the MUL instruction needs 3 clock [GATE-2008]
cycles in the EX stage. Operand
forwarding is used in the pipelined Q.26 Which of the following must be true
processor. for the RFE (Return from Exception)
What is the number of clock cycles instruction on a general purpose
taken to complete the following processor?
sequence of instructions? 1. it must be a TRAP instruction.
ADD R 2 , R1 , R 0 R 2  R1  R 0 2. it must be a privileged instruction.
3. An exception cannot be allowed to
MUL R 4 , R3, R 2 R 4  R 3 *R 2 occur during execution of an REE
SUB R6 , R5 , R 4 R6  R5  R 4 instruction.
a) 7 b) 8 a) 1 only b) 2 only
c) 10 d) 14 c) 1 and 2 d) 1, 2 and 3
[GATE-2007] [GATE-2008]
Statements for Linked Questions no 24 Q.27 Consider a 4 stage pipeline

and 25 processor. The number of cycles
Delayed branching can help in the handling needed by the four instructions
of control hazards. l1 ,l2 ,l3 ,l4 in stage S1 ,S2 ,S3 ,S4 is
shown below.
Q.24 For all delayed conditional branch
S1 S2 S3 S4
instructions, irrespective of whether
𝑙21 1 1 1
the condition evaluations to true or
false 𝑙12 3 2 2
a) the instruction following the 𝑙23 1 1 3
conditional branch instruction in 𝑙14 2 2 2
memory is executed What is the number of cycles needed
b) the first instruction in the fall to execute the following loop?
through path is executed For (i = 1 to 2) { l1 ,l2 ,l3 ,l4 }
c) the first instruction in the taken a) 16 b) 23
path is executed c) 28 d) 30
d) the branch takes longer to [GATE-2010]
execute than any other instruction
[GATE-2008] Q.28 A 5-stage pipelined processor has
Instruction Fetch (IF), Instruction
Q.25 The following code is to run on a Decode (ID), Operand Fetch (OF),
pipelined processor with one branch Perform Operation (PO) and Write
delay slot : Operand (WO) stages. The IF, ID, OF
L1 : ADD R2 < R7 + R8 and WO stages take 1 clock cycle
80
each for any instruction. The PO 20 and the contents of register R2.
stage takes 1 clock cycle for ADD Which of the following best reflects
and SUB instructions, 3 clock cycles the addressing mode implemented
for MUL instruction, and 6 clock by this instruction for the operand
cycles for DIV instruction in memory?
respectively. Operand forwarding is a) Immediate addressing
used in the pipeline. What is the b) Register addressing
number of clock cycles needed to c)Register indirect called addressing
execute the following sequence of d) Base indexed addressing
instructions? [GATE-2011]
Instruction Meaning of
Q.31 Registered renaming is done in
instruction
pipelined processors
l0 : MULR 2 , R 0 , R1 R 2  R 0 *R1
a) as an alternate to register
l1 : DIVR 5 , R 3 , R 4 R5  R3 / R 4 allocation at compile time
l1 : ADDR 2 , R 3 , R 2 R 2  R5  R 2 b) for efficient access to function
parameters and local variables
l1 : SUBR 5 , R 2 , R 6 R5  R 2  R6 c) to handle certain kinds of
a)3.4 b) 4.4 hazards
c) 5.1 d) 6.7 d) as part of address translation
[GATE-2010] [GATE-2012]
Q.29 Consider an instruction pipeline Q.32 Consider the following sequence of
with four stages (S1, S2, S3 and S4) micro-operations.
each with combinational circuit MBR←PC
only. The pipeline registers are MAR←X
required between each stage and at PC ← Y
the end of the last stage. Delays for Memory ←MBR
the stages and for the pipeline Which one of the following is a
registers are as given in the figure possible operation performed by
this sequence?
a) Instruction fetch
b) Operand fetch
c) Conditional branch
What is the approximate speed up of
d) Initiation of interrupt service
the pipeline in steady state under
[GATE-2013]
ideal conditions when compared to
the corresponding non-pipeline
Q.33 Consider an instruction pipeline
Implementation?
with five stages without any branch
a) 4.0 b) 2.5
prediction. Fetch Instruction (FI),
c) 1.1 d) 3-0
Decode Instruction (DI), Fetch
[GATE-2011]
Operand (FO), Execute Instruction
Q.30 Consider a hypothetical processor (EI) and Write Operand (WO). The
with an instruction of type LW (R1), stage delays for FI, DI, FO, EI and
20 (R2). WO are 5 ns, 7 ns, 10 ns, 8 ns and 6
Which during execution reads a 32- ns, respectively. There are
bit word from memory and stores it intermediate storage buffers after
in a 32 bit registers R1. The effective each stage and the delay of each
address of the memory location is buffer is 1 ns. A program consisting
obtained by the addition of constant of 12 instructions I1 , I2 , I3 ,.I12 is
81
executed in this pipelined processor. have zero latency.
Instruction I4 is the only branch P1:Four-stage pipeline with stage
instruction and its branch target is latencies 1 ns, 2 ns, 2 ns, 1 ns.
𝐼9 . If the branch is taken during the P2:Four-stage pipeline with stage
execution of this program, the time latencies 1 ns,1.5 ns,1.5 ns,1.5 ns.
(in ns) needed to complete the P3:Five-stage pipeline with stage
program is latencies0.5ns,1ns,1ns,0.6ns,1ns.
a) 132 b) 185 P4:Five-stage pipeline with stage
c) 176 d) 328 latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns,
[GATE-2013] 1.1 ns.
Which processor has the highest
Q.34 A machine has a 32-bit architecture, peak clock frequency?
with 1-word long instructions. It has a) P1 b) P2
64 registers, each of which is 32 bits c) P3 d) P4
long. It needs to support 45 [GATE-2014-3]
instructions, which have an
immediate operand in addition to Q.38 An instruction pipeline has five
two register operands. Assuming stages, namely, instruction fetch
that the immediate operand is an (IF), instruction decode and register
unsigned integer, the maximum value
fetch (ID/RF), instruction execution
of the immediate operand is ______.
(EX), memory access (MEM), and
[GATE-2014-1]
register write back (WB) with stage
Q.35 Consider a 6-stage instruction latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and
pipeline, where all stages are 0.75 ns, respectively (ns stands for
perfectly balanced. Assume that nanoseconds). To gain in terms of
there is no cycle-time overhead of frequency, the designers have decided
pipelining. When an application is to split the ID/RF stage into three
executing on this 6-stage pipeline, stages (ID, RF1, RF2) each of latency
the speedup achieved with respect 2.2/3 ns. Also, the EX stage is split
to non-pipelined execution if 25% of into two stages (EX1, EX2) each of
the instructions incur 2 pipeline latency 1 ns. The new design has a
stall cycles is___________. total of eight pipeline stages. A
[GATE-2014-1] program has 20% branch
instructions which execute in the EX
Q.36 Consider two processors P1 and P2 stage and produce the next
executing the same instruction set. instruction pointer at the end of the
Assume that under identical EX stage in the old design and at the
conditions, for the same input, a end of the EX2 stage in the new
program running on P2 takes 25% design. The IF stage stalls after
less time but incurs 20% more CPI fetching a branch instruction until
(clock cycles per instruction) as the next instruction pointer is
compared to the program running computed. All instructions other
on P1. If the clock frequency of P1 is than the branch instruction have an
1GHz, then the clock frequency of P2 average CPI of one in both the
(in GHz) is __________. designs. The execution times of this
[GATE-2014-1] program on the old and the new
Q.37 Consider the following processors design are P and Q nanoseconds,
(ns stands for nanoseconds). respectively. The value of P/Q is ____.
Assume that the pipeline registers [GATE-2014-3]
82
Q.39 For computers based on three- number of clock cycles taken for the
address instruction formats, each execution of the above sequence of
address field can be used to specify instructions is __________
which of the following: [GATE-2015-2]
S1: A memory operand
S2: A processor register Q.42 Consider a processor with byte-
S3: An implied accumulator register addressable memory. Assume that
a) Either S1 or S2 b) Either S2 or S3 all registers, including Program
c) Only S2 and S3 d)All of S1,S2&S3 Counter (PC) and Program Status
[GATE-2015-1] Word (PSW), are of size 2 bytes. A
stack in the main memory is
Q.40 Consider a non-pipelined processor implemented from memory location
with a clock rate of 2.5 gigahertz and (0100)16 and it grows upward. The
average cycles per instruction of stack pointer (SP) points to the top
four. The same processor is element of the stack. The current
upgraded to a pipelined processor value of SP is (016E)16 . The CALL
with five stages; but due to the instruction is of two words, the first
internal pipeline delay, the clock word is the op-code and the second
speed is reduced to 2 gigahertz. word is the starting address of the
Assume that there are no stalls in subroutine (one word 2bytes). The
the pipeline. The speed up achieved CALL instruction is implemented as
in this pipelined processor is _______. follows:
[GATE-2015-1]  Store the current Vale of PC in
the Stack
Q.41 Consider the sequence of machine  Store the value of PSW register
instruction given below: in the stack
MUL R5, R0, R1 DIV R6, R2, R3  Load the starting address of the
ADD R7, R5, R6 SUB R8, R7, R4 subroutine in PC
In the above sequence, R0 to R8 are The content of PC just before the
general purpose registers. In the fetch of a CALL instruction is
instructions shown. The first register (5FA0)16. After execution of the
stores the result of the operation CALL instruction, the value of the
performed on the second and the stack pointer is
third registers. This sequence of a) (016A)16 b) (016C)16
instructions is to be executed in a c) (0170)16 d) (0172)16
pipelined instruction processor with [GATE-2015-2]
the following 4 stages (1)
Instruction Fetch and Decode (IF), Q.43 Consider the following code
(2) Operand Fetch (OF), (3) Perform sequence having five instructions I1
Operation (PO) and (4) Write back to I5. Each of these instructions has
the result (WB). The IF,OF and WB the following format.
stages take OP Ri, Rj, Rk
1 clock cycle each for any Where operation OP is performed
instruction The PO stage takes 1 on contents of registers Rj and Rk
clock cycle for ADD or SUB and the results is stored in register
instruction, 3 clock cycles for MUL Ri.
instruction and 5 clock cycles for DIV I1 : ADD R1, R2, R3
instruction. The pipelined processor I2 : MUL R7, R1, R3
uses operand forwarding from the I3 : SUB R4, R1, R5
PO stage to the OF stage. The I4 : ADD R3, R2, R4
83
I5 : MUL R7,R8, R9 Q.46 Consider a processor with 64
Consider the following three registers and an instruction set of
statements. size twelve. Each instruction has five
S1:There is an anti-dependence distinct fields, namely, opcode, two
between instructions I2 and I5 source register identifiers, one
S2:There is an anti-dependence destination register identifier, and a
between instructions I2 and I4 twelve-bit immediate value. Each
S3: Within an instruction pipeline instruction must be stored in memory
an anti-dependence always in a byte-aligned fashion. If a program
creates on or more stalls has 100 instructions, the amount of
Which one of above stamens is/are memory (in bytes) consumed by the
correct? program text is ______.
a) Only S1 is true [GATE-2016-2]
b) Only S2 is true
Q.47 Consider a 3 GHz (gigahertz)
c) Only S1 and S3 are true
processor with a three-stage pipeline
d) Only S2 and S3 are true
and stage latencies τ1,
[GATE-2015-3]
τ2, and τ3 such that τ1 = 3τ2/4 = 2τ3.
Q.44 The stage delays in a 4-stage If the longest pipeline stage is split
pipeline are 800, 500, 400 and 300 into two pipeline stages of equal
picoseconds. The first stage (with latency, the new frequency is ________
delay 800 picoseconds) is replaced GHz, ignoring delays in the pipeline
with a functionally equivalent design registers.
involving two stages with respective [GATE-2016-2]
delays 600 and 350 picoseconds.
Q.48 Suppose the functions F and G can
The throughput increase of the
be computed in 5 and 3
pipeline is _____ percent.
nanoseconds by functional units UF
[GATE-2016-1]
and UG, respectively. Given two
Q.45 A processor has 40 distinct instances of UF and two instances of
instructions and 24 general purpose UG, it is required to implement the
registers. A 32-bit instruction word computation F (G (Xi)) for 1 ≤ i ≤ 10.
has an opcode, two register operands Ignoring all other delays, the
and an immediate operand. The minimum time required to complete
number of bits available for the this computation is ______
immediate operand field is _________. nanoseconds.
[GATE-2016-2] [GATE-2016-2]
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(c) (b) (c) (b) (c) (d) (d) (a) (a) (c) (a) (b) (a) (b)
15 16 17 18 19 20 21 22 23 24 25 26 27 28
(a) (c) (a) (b) (a) (a) (d) (a) (a) (b) (b) (a) (d) (d)
29 30 31 32 33 34 35 36 37 38 39 40 41 42
(b) (b) (d) (c) (d) (b) 16383 4 1.6 (c) 1.54 (a) 3.2 13
43 44 45 46 47 48
(d) 33.33 16 500 4 28
84
EXPLANATIONS
Q.1 (c) absolute address referred to by a
Indirect addressing— Pointers block of instruction. In this way the
Immediate addressing —Constants processor is able to move the entire
Auto decrement addressing block from one region of main
— Loops memory to other.
Q.2 (b) Q.6 (d)

As per the given, there are two time In the first statement RAW (Read
calculations, one on pipelined CPU After Write) hazard occurs as they
and other on non-pipelined CPU. +1 instruction uses the result of yth
The cache here is that the non- instruction and operand, thus data
pipelined CPU is identical to the dependency occurs here.
pipelined CPU. In such cases, T1 T2. 1. In the second statement
conditional dependency occurs as
Q.3 (c) there is the execution of a
The form instruction opcode tells conditional jump instruction which
the operation code that needs to be causes flushing.
performed and is not a form of 2. The third statement causes a WAR
memory whereas the instruction (Write After Read) hazard as both
cache, instruction register and the jth and j+1st instruction requires
translation look a side buffer are the ALU at the same time.
forms of memory. Thus all the three statements cause
hazard.
Q.4 (b)
An overflow cannot occur when a Q.7 (d)
positive number is added to a AlI the given statements correspond
negative number. It occurs when the to the reason why the performance
two numbers added are both of the pipelined processors suffers
positive or negative. If two numbers since when the pipeline stages have
being added are signed, the sign bit different delays, the performance
is inferred as the part of number gets affected also since the
itself and the end carry does not consecutive statements are
indicate an overflow. dependent on each other it has a
negative effect on the performance
Q.5 (c) of the processor. And if there are
The addressing mode suitable for any hardware faults the
program relocation at nun-time is performance is effected as the
based addressing and relative pipelined stages share the hardware
addressing as in relative addressing resources.
there is a field corresponding to offset
Q.8 (a)
or displacement D. Thus accordingly
As per the given conditions since the
address is given as R + D, where R is initial value of register A is A 0 then
CPU register. the value of register B will be the
Now, when the content of R is number of 0 bit in A 0 after the
changed, the processor changes the
program gets executed.
85
Q.9 (a) 6+2=8
The value of register A will remain 1+ 1 = 2
same when the instruction RRC A, # Thus, the total comes out to be 24.
is inserted. As this statement does
not affect the value stored in the Q.13 (a)
register. Given that R0  R0 + R1
The clock cycles operate as follows :
Q.10 (c) Cycle 1
In pipeline total delay (150+ Out : R1
120+160+ 140) =570 And we know In : S
that the formula for a k stage Cycle 2 .
pipeline which can process n tasks Out : R2
in T k time is In : T
T k = [k + (n - 1)] t Cycle 3
Maximum delay tm is of 160. Out : S, T
Therefore, t =160 + 5 =165 ns As k Add: ALU'
= 4 and n =1000 . In : R
Therefore, T= [4 + (1000 - 1 )]t Therefore, execution cycle is
= 165.5 μs completed in 3 clock cycles.
Q.11 (a) Q.14 (b)
The following table gives the As given
instruction and its size and also the Rn < = PC +1;
location in decimal which it PC <= M[PC]
occupies. The clock cycles operate as follows :
Instruction Instruction Location Cycle 1
size Out : PC
MOV R1, 1000 2 1000 to 1007 In : S, MAR
MOV R2, R1 1 1008 to 1011
Cycle 2
ADD R2, R3 1 1012 to 1015
MOV 1005, R2 2 1016 to 1023
Out : S
HALT 1 1024 to 1027 Increment : ALU
The return address saved on the In : Rn
stack is 1024 when an interrupt Cycle 3
occurs after executing the halt Out : MDR
statement the CPU gets halted. In : PC
Therefore, execution cycle is
Q.12 (b) completed in 3 clock cycles.
Operation Instruction Q.15 (a)

size(in words)
R1 ← Memory [5000] 2 Clock Cycle R0=M [loc = R0 = R0 R2 = R 2 -
R2 ← Memory [R1] 1 1] + R0 R0
R1 ← R2 + R3 1 1 IF
; Memory [6000] ← 2 2 RD
; Machine halts 1 3 EX IF
Therefore, the required clock cycles 4 MA RD
as per the given situation become in 5 WB EX
6 MA IF
the order as above :
7 WB RD
6+ 2=8 8 EX
3+ 2 = 5 9 MA
1=1 10 WB
86
Thus, total number of clock cycles required. Thus, is the option mask
required = 10  0 × 1 < < pos is correct.
Q.16 (c) Q.20 (a)

Following are the memory cycles Each address is multiple of 3 as the
needed during the execution cycle: starting address is 300 and is each
First memory cycle Read value of A instruction consists of 24 bit, i.e.
to calculate index addresses 3byte.
Second memory cycle Add R0 to Thus, in the given options the, valid
get the address of first source and counter will be the one which is the
read its value multiple of 3. Out of the options we
Third memory cycle Read the can see that only 600 satisfies the
contents of B condition."
Fourth memory cycle Read the Therefore, it is 600.
values of the contents of B
Fifth memory cycle Write the Q.21 (d)
answer The memory reference required by
→ Total memory cycles needed the instruction R1  M [3000] is 1
during the execution cycle of the whereas the memory reference
instruction = 5 required by R2 M[R3] and M[R3]
R2 is 10.
Q.17 (a) Therefore, total memory reference
1. A[l] = B[J] indexed addressing required = 2x10+1
mode.
2. While [*A ++] auto increment Q.22 (a)
3. int temp = *x Indirect addressing The content in the memory location
2010 doesn't change as the value in
Q.18 (b) register R1 becomes zero, so the
As given there is no conditional loop loop BNZ exists and as the address
for the 80% of 10^9 instruction thus in R3 becomes 2010. Thus, the
only single cycle is required for content of the location still remains
them and rest requires 2 extra 100.
cycles.
Thus the total time required = Time Q.23 (a)
of one cycle The locations of various operations
= (80/100×109×1+20/100 × 109 ×2) are given in the following table:
= 1/1G × (80/100 × 109 × 1 +
20/100 × 109 x 2) Loc Opera
= 1.2 s atio
100 tion
R1 
n
0 M
100
Q.19 (a) [3000
1
100 R3
From the given conditions it can be ]
determined that we have to set all 2
100 M[R
R2 
the other bits to 0 in temp, as the 3
100 R]1 3]+
3
M[R
position pos is the only deciding 4 R
R2 2
100 R 3 
factor in jumping to the label. If the 5 R3 + 1
left shift over 1 is done by the pos ;
number then the mask register can Thus, the location of the instruction
have 1 in pos place, which is INCR3 is 1005, if an interrupt occurs,
and thus it is pushed on the stack.
87
Thus all the three statements are
Q.24 (b) true as far as RFE is concerned.
As given the pipelined processor has
four stages, i.e. IF, ID, EX, WB. Q.28 (d)
And we know that number of clock When i =1
cycles required to ADD and SUB Number of cycles needed to execute
instructions is 1 and by MUL the given loop
instructions are 3. = 2 + 1 + 3 + 2+2 + 3 + 2=15
In the pipelined processor while one Thus total cycles required=2×15=30
instruction is fetched, the other is
either being decoded or executed or Q. 29 (b)
some action is being performed. As per the given, the instructions are
Thus, the number of cycles required arranged accordingly to their
by the given set of instructions can meanings. We get the following:
be obtained from the following
diagram
Clock ADD MUL SUB

1
(.vdc- IF
2 ID IF
Here, we can see that the; last
3 EX ID IF
4 WB EX ID operation (Write Operand) comes at
5 EX the 15th clock cycle so it takes 15
6 EX / clock cycles to execute given
7 WB EX sequence of instructions.
8 WB
Q.30 (b)
Thus, total number of clock (5  6  11  8) 1
Speed up =
cycles required are 8. (11  1)
30
Q.25 (b)   2.5
12
The first instruction in the fall
through path is executed for all the
Q.31 (d)
delayed conditional branch
The addressing mode will be base
instructions, irrespective of whether
index addressing. Here, 20 will work
the condition evaluates to true or
as base and content of R2will be
false.
index.
Q.26 (a)
Q.32 (c)
Because of the data dependency of
Register naming is done is pipelined
instruction l4, instruction l2 occupies
processors to handle certain kinds
the delay slot.
of hazards.
Q.27 (d)
Q.33 (d)
When an RFE (Return From
The following sequence of micro-
Exception) instruction is executed,
operations
no exception is allowed to occur
MBR ← PC
during that time and also it must be
MAR ← X
a trap instruction and also a
PC ← Y
privileged one.
88
Memory ← MBR instead of 55 ns because time for
Analysis fetching I9 can be overlap with WO
1. First micro operations stores the of I4.
value of PC into Memory Base ∴ Total Time is = 88 + 88 – 11 = 165
Register (MBR) ns
2. Second micro operations stores Q.35 (16383)
the value of X into Memory Address 1 Word = 32 bits
Register (MAR) Each instruction has 32 bits
3. Third micro operation stores To support 45 instructions, opcode
value of Y into PC. must contain 6-bits
4. Fourth micro operation stores Register operand1 requires 6 bits,
value of MBR to memory. since the total registers are 64.
So before execution of these Register operand 2 also requires 6
instructions PC holds the value of bits
next instruction to be executed. We
first stores the value of PC to MBR
and then through MBR to memory
i.e., We are saving the value of PC in
memory and then load PC with a
14-bits are left over for immediate
new value. This can be done only in
Operand Using 14-bits, we can give
two types. Operations Conditional
maximum 16383,
branch and interrupt service. As we
Since 214 = 16384 (from 0 to
are not checking here for any
16383)
conditions . So, it is an initiation of
interrupt service.
Q.36 (4)
For 6 stages, non- pipelining takes 6
Q.34 (b)
cycles.
Instruction pipeline with five stages
There were 2 stall cycles for
without any branch prediction:
pipelining for 25% of the
Delays for FI , DI , FO , EI and WO are
instructions
5,7,10,8,6 ns respectively.
 25  3
The maximum time taken by any So pipe line time = 1  2 = =
stage is 10 ns and additional 1 ns is  100  2
required for delay of buffer. is 10 ns 1.5
and additional Non  pipeline time
Speed up =
∴ The total time for an instruction to Pipeline time
pass from one stage to another in 11 6
ns.  4
1.5
The instructions are executed in the
following order
Q.37 (1.6)
I1 , I2 , I3 , I4 , I9 , I10 , I11 , I12
109
Execution with Time 1 cycle time for p1 =  1n.s
Now when I4 is in its execution stage 1GH
we detect the branch and when I4 is Assume p1 takes 5 cycles for a
in WO stage we fetch I9 So time for program then p2 takes 20% more,
execution of instructions from I9 to means, 6 cycles.
I12 is = 11 * 5 + (4 - 1) * 11 = 88 ns. p2 Takes 25% less time, means, if p1
But we save 11 ns when fetching I9 takes 5 n.s, then p2 takes 3.75 n.s.
.i.e., I9 requires only 44 ns additional Assume p2 clock frequency is x GHz.
89
6 109 Q.43 (d)
p2 Taken 6 cycles, so I1) R1←R2+R3
x GH
 3.75, x  1.6 I2) R7←R1×R3
I3) R4←R1-R5
I4) R3←R2+R4
Q.38 (C)
I5) R7←R8+R9
Clock period (CP) = max stage delay
Anti dependence
+ overhead
i) -------- = x
So CPP1 = Max(1,2,2,1) = 2ns
j) X: -------
CPP2 = Max(1,1.5,1.5,1.5) = 1.5ns
Then i and j are anti - dependence
CPP3 = Max(0.5,1,1,0.6,1) = 1ns
Hence I2 and I4 are anti-dependence
CPP1 = Max(0.5,0.5,1,1,1.1) = 1.1ns
1 ⇒ Anti-dependence create stall in
As frequency α C.P , so least clock pipeline
period will give the highest peak
1 Q.44 (33.33)
clock frequency. So, fp3 =  1GHz
1ns Old design tp= 800
New design tp = 600
Q.39 (1.54) Throughput
800  600
No. of Stall Stall Clock Avg. 100%  33.33%
stages cycle frequency period access 600
time
Old 5 2 20% 2.2ns P Q.45 (16)
design
New 8 5 20% 1 ns Q
design
  1 2  So 16 bit for immediate operand

P  80% 1clock   20%      Tcp
  completion stall clock   field
P = (.8 +.6)×2.2ns = 3.08ns
  1 5  Q.47 (4)
Q  80% 1clock   20%      Tcp Pipeline New Pipeline
  completion stall clock  
4
t p  z1 t p  z1
P= (.8 + .12)×1ns = 2ns 3
P 3.08ns
So the value of   1.54
Q 2ns
Q.42 (13)
I ⇒ Instruction Fetch and Decode

O ⇒ Operand Fetch
P ⇒ Perform operation
W ⇒ write back the result
1 2 3 4 5 6 7 8 9 10 11 12 13
I O P P P W
I O − − P P P P P W
I − − O − − − − P W
I − − O − − − − P W
90
3 CPU CONTROL DESIGN & INTERFACES
Q.1 Which is the most appropriate a) test the interrupt system of the
match for the items in the first processor
column with the items in the second b) implement co-routines
column: c) obtain system services which
List I List II need execution of privileged
P.Indirect addressing 1.Array instruction
implementation d) return from subroutine
Q. Indexed addressing 2. Writing relocatable [GATE-2002]
code
R. Base register 3. Passing array as
addressing parameter Q.4 In the absolute addressing mode
a) P-3 , Q.1 , R-2 b) P-2, Q.3 , R-1 a) the operand is inside the
c) P-3 , Q.2 , R-1 d) P-1 , Q.3, R-2 instruction
[GATE-2001] b) the address of the operand is
inside the instruction
Q.2 Consider the following data path of a c) the register containing the
simple non-pipelined CPU. The address of the operand is
registers A, B, A1, A2, MDR, the bus specified inside the instruction
and the ALU are 8-bit wide, SP and d) the location of the operand is
MAR are 16-bit registers. The MUX implicit
is of size 8 X (2:1) and the DEMUX is [GATE-2002]
of size 8 X (1:2). Each memory
operation takes 2 CPU clock cycles Q.5 A device employing INTR line for
and uses MAR (Memory Address device interrupt puts the CALL
Register) and MDR (Memory Date instruction on the data bus while
Register). SP can be decremented a) INTA is active
locally. b) HOLD is active
c) READY is active
d) None of these
[GATE-2002]
Q.6 Consider an array multiplier for

multiplying two n-bit numbers. If
each gate in the circuit has a unit
The CPU instruction "push r", where delay, the total delay of the
= A or B, has the specification multiplier is
M [SP]←r a) e (1) b) e (log n)
SP←SP SP- 1 c) e (n) d) e (n2).
How many CPU clock cycles are [GATE-2004]
needed to execute the "push r"
instruction? Q.7 Which one of the following is true
a) 2 b) 3 for a CPU having a single interrupt
c) 4 d) 5 request line and a single interrupt
[GATE-2001] grant line?
a) Neither vectored interrupt nor
Q.3 A processor needs software multiple interrupting devices are
interrupt to possible
91
b) Vectored interrupts are not additional ALU is required for
possible but multiple effective address calculation.
interrupting devices are possible 3. The amount of increment
c) Vectored interrupts and multiple depends on the size of the data
interrupting devices are item accessed.
bothpossible a) 1 only b) 2 only
d) Vectored interrupt is possible but c) 3 only d) 2 and 3
multiple interrupting devices are [GATE-2008]
not possible
[GATE-2005] Q.11 A CPU generally handles an
interrupt by executing an interrupt
Q.8 Normally user programs are service routine
prevented from handling I/O a) as soon as an interrupt is raised
directly by I/O instructions in them. b) by checking the interrupt
For CPUs having explicit I/0 register at the end of fetch cycle
instructions, such I/O protection is c) by checking the interrupt
ensured by having the I/0 register after finishing the
instructions privileged. In a CPU execution of the current
with memory mapped I/O, there is instruction
no explicit I/O instruction. Which d) by checking the interrupt
one of the following is true for a CPU register at fixed time intervals
with memory mapped I/0? [GATE-2009]
a) I/O protection is ensured By
operating system routine(s) Q.12 On a non-pipelined sequential
b) I/0 protection is ensured by a processor, a program segment,
hardware trap which is a part of time interrupt
c) I/0 protection is ensured during service routine, is given to transfer
system configuration 500 byte from an I/O device to
d) I/O protection is not possible memory.
[GATE-2005] Initialize the address register
Initialize the count to 500
Q.9 Horizontal micro-programming LOOP: Load a byte from device
a) does not require use of signal Store in memory at address given by
decodes address register
b) results in larger sized micro- Increment the address register
instructions than vertical micro- Decrement the count
programming If count ! =O go to LOOP
c) uses one bit for each control Assume that each statement in this
signal program is equivalent to a machine
d) All of the above instruction which takes one clock
[GATE-2006] cycle to execute if it is a non-
load/store instruction. The load-
Q.10 Which of the following is/are true of store instructions take two clock
the auto-increment addressing cycles to execute. The designer of
mode? the system also has an alternate
1. It is useful in creating self- approach of using the DMA
relocating code. controller to implement the same
2. If it is included in an Instruction transfer. The DMA controller
Set Architecture, then an required 20 clock cycles for
92
initialization and other overheads. Q.14 Consider a main memory system
Each DMA transfer cycle takes two that consists of 8 memory modules
clock cycles to transfer one byte of attached to the system bus, which is
data from the device to the memory. one word wide. When a write
What is the approximate speed up request is made, the bus is occupied
when the DMA controller based for 100 nanoseconds (ns) by the
design is used in place of the data, address, and control signals.
interrupt driven program based During the same 100 ns, and for 500
input-output? ns thereafter, the addressed memory
a) 3.4 b) 4.4 module executes one cycle accepting
c) 5.1 d) 6.7 and storing the data. The (internal)
[GATE-2011] operation of different memory
modules may overlap in time, but
Q.13 The amount of ROM needed to only one request can be on the bus
implement a 4 bit multiplier is at any time. The maximum number
a) 64 bit b) 128 bit of stores (of one word each) that can
c) 1 kbit d) 2 kbit be initiated in 1 millisecond is ______
[GATE-2012] [GATE-2014-2]
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(a) (b) (c) (d) (a) (c) (b) (a) (c) (c) (c) (a) (d) 10000
93
EXPLANATIONS
Q.1 (a) An array (nx n) containing 2n-1 gate
X. Indirect addressing  Passing cells . If each unit cell contains a θ
array as parameter (1) then the total delay comes out to
Y. Indexed addressing  array be of (2/7-1) θ(1) which
implementation corresponds to θ (n).
Z. Base register addressing  Q.7 (b)
Writing relocatable code In single line interrupt system,
vectored interrupts are not possible
Q.2 (b) but multiple interrupting devices
From the given data it can be are possible as a single line
determined that the number of CPU interrupt system consists of a single
clock cyles required to execute the interrupt request line and a
"push r" instruction is 3. interrupt grant line in such a system
it may be possible that at the same
Q.3 (c) time more than one output devices
Some of the operations in the can request an interrupt, thus in
system can be assigned to a mode such cases only one request will be
called supervisor mode only. granted according to the priority as
Software interrupt is an interrupt is depicted by the following figure,
that is expected with the help of but the interrupt is granted to the
some instructions which are then single request only;
executed. It can be used to interrupt
a procedure at any desired location
and is most importantly associated
with a supervisor call which
provides the ability to sustain from
a CPU user mode to the supervisor
mode.
Q.8 (a)
Q.4 (d) To find the solution, following are
the points to keep in mind :
Q.5 (a) 1. An I/O port assigned to memory
When INTR is high, an interrupt is cannot be assigned to an address bit
enabled and the micro-processor pattern and vice-versa..
completes the current instruction 2. Memory-mapped I/O requires
and disables the interrupt, enables that the same set of addresses is
the flip-flop and simultaneously shared by the memory locations and
sends an acknowledgement on INTA I/O ports.
which is active low, telling that an Therefore, I/O protection is ensured
interrupt is being serviced and by" the operating system routine(s).
during this another interrupt cannot
occur until the interrupt flip-flop is Q.9 (c)
enabled again. Detection of concurrently
executable micro-operations is an
Q.6 (c) important consideration for effective
94
horizontal micro-programming. Since, Q.14 (10000)
it is highly machine dependent and Each write request, the bus is
requires knowledge of highly occupied for 100 n.s
intricate features of a machine, only Storing of data requires 100 n.s.
limited effort has been made so far In 100 n.s − 1 store
to drive an algorithm for micro- 100
program parallelism to enable n.s = 1 store
106
optimization of horizontal micro- 106
programs. Therefore, in the 1 m.s = stores
horizontal microprogramming, one 100
bit for each control signal is used. =10000 stores
Q.10 (c)
For incrementing the data, the auto-
increment addressing mode is used
which purely depends on the size of
the data. For example :
Regs [R1 ] Regs [R1] +Mem [Regs
[R2]
Regs[R2] Regs[R2] +d
d is the size of the data that is being
accessed.
Q.11 (c)
The interrupt register is checked
after finishing the execution of the
current instruction. At this time, a
CPU generally handles an interrupt
by the execution of an interrupt
service routine.
Q.12 (a)
Number of clock cycles required by
using load-store approach = 2 + 500
x 7 = 3 502 and that of by using
DMA = 20 +500 x 2=1020
Required speed up=3502/1020= 3.4
Q.13 (d)
The normal size of ROM is n × 2n
∴ Now, we are multiplying two n-bit
numbers.
So, the resultant has 2n bit.
Hence, the size of the ROM is 2n× 22n
In the question n = 4
Hence  2 × 4 × 22 × 4
 8 ×28  23 × 28
 2 × 210  2 k bit
95
4 SECONDARY MEMORY & DMA
Q.1 What is the swap space in the disk consumed for the transfer operation?
used for? a) 5.0% b) 1.0%
a) Saving temporary HTML pages c) 0.5% d) 0.1%
b) Saving process data [GATE-2006]
c) Storing the super-block
Q.4 A device with data transfer rate 10
d) Storing device drivers
Kbyte/s is connected to a CPU. Data
[GATE-2005]
is transferred byte wise. Let
the interrupt overhead be 4 μs. The
Q.2 The micro-instruction stored in the
byte transfer time between the
control memory of a processor has a
device interfaces register and CPU
width of 26 bit. Each micro-
or memory is negligible. What is the
instruction is divided into three
minimum performance gain of
fields; a micro-operation field of 13
operating the device under
bit, a next address field (X), and a
interrupt mode over operating it
MUX select field (Y), there are 8
under program controlled mode?
status bits in the inputs of the MUX.
a) 15 b) 25
c) 35 d) 45
[GATE-2006]
Q.5 Consider a disk drive with the
following specifications:
16 surfaces, 512 tracks /surface,
512 sectors/track, 1 Kbyte/sector,
rotation speed 3000 rpm. The
disk is operated in cycle stealing
mode whereby whenever one 4
byte word is ready it is sent to
memory; similarly, for writing,
How many bits are there in the X
the disk interface reads a 4 byte
and Y fields, and what is the size of
word from the memory in each DMA
the control memory in number of
cycle.
words?
Memory cycle time is 40 ns. The
a) 10, 3, 1024 b) 8, 5, 256
maximum percentage of time that
c) 5, 8, 2048 d) 10, 3, 512
the CPU gets blocked during DMA
[GATE-2006]
operation is
a) 10 b) 25
Q.3 A hard disk with a transfer rate of
c) 40 d) 5O
10 Mbyte/s is constantly
[GATE-2006]
transferring data to memory using
DMA. The processor runs at 600 Q.6 Consider a disk pack with 16
MHZ, and takes 300 and 900 clock to surfaces of 128 tracks per surface
initiate and complete DMA transfer and 256 sectors per track. 512 byte
respectively. if the size of the of data are stored in a bit serial
transfer is 20 Kbyte, what is the manner in a sector. The capacity of
percentage of processor time the disk pack and the number of bits
96
required to specify a particular the disk and the starting disk
section in the disk are respectively. location of the file is <1200, 9, 40>.
a) 256 Mbyte, 19 bit What is the cylinder number of the
b) 256 Mbyte, 28 bit last sector of the file, if it is stored in
c) 512 Mbyte, 20 bit a contiguous manner?
d) 64 Gbyte, 28 bit a) 1281 b) 1282
[GATE-2007] c) 1283 d) 1284
[GATE-2013]
Q.7 For a magnetic disk with concentric
circular tracks, the seek latency is Q.11 Consider a disk pack with a seek
not linearly proportional to the seek time of 4 milliseconds and rotational
distance due to speed of 10000 rotations per minute
a) non-uniform distribution of (RPM). It has 600 sectors per track
requests and each sector can store 512 bytes
b) arm starting and stopping inertia of data. Consider a file stored in the
c) higher capacity of tracks on the disk. The file contains 2000 sectors.
periphery of the platter Assume that every sector access
d) use of unfair arm scheduling necessitates a seek, and the average
policies rotational latency for accessing each
[GATE-2008] sector is half of the time for one
complete rotation. The total time (in
Common Data for Questions 8 and 9
milliseconds) needed to read the
A hard disk has 63 sectors per track, 10
entire file is _________.
platters each with 2 recording surfaces and
[GATE-2015-1]
1000 cylinders. The address of a sector is
given as a triple (c, h, s), where c is the Q.12 Consider a typical disk that rotates
cylinder number, h is the surface number at 15000 rotations per minute
and s is the sector number. Thus, the 0th (RPM) and has a transfer rate of
sector is addressed as (0, 0, 0), the 1st 50×106 bytes/sec. if the average
sector as (0, 0, 1), and so on. seek time of the disk is twice the
average rotational delay and the
Q.8 The address <400, 16, 29 >
controller’s transfer time is 10 times
corresponds to sector number
the disk transfer time, the average
a) 505035 b) 505036
time (in milliseconds) to read or
c) 505037 d) 505038
write a 512-byte sector of the disk is
[GATE-2010]
__________.
Q.9 The address of the 1038th sector is [GATE-2015-2]
a) <0, 15, 31> b) <0, 16, 30>
Q.13 The size of the data count register of
c) <0, 16, 31> d) <0, 17, 31>
a DMA controller is 16 bits. The
[GATE-2010]
processor needs to transfer a file of
Q.10 Consider a hard disk with 16 29,154 kilobytes from disk to main
recording surfaces (0-15) having memory. The memory is byte
16384 cylinders (0-16383) and each Addressable . The minimum number
cylinder contains 64 sectors (0-63). of times the DMA controller needs to
Data storage capacity in each sector get the control of the system bus
is 512 bytes. Data are organized from the processor to transfer the
cylinder—wise and the addressing file from the disk to main memory
format is <cylinder no., sector no.>. is_____.
A file of size 42797 KB is stored in [GATE-2016-1]
97
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13
(b) (a) (a) (a) (b) (a) (c) (c) (c) (d) 14020 6.1 456
EXPLANATIONS
Q.1 (b) Single byte transfer with interference
Let us assume that CPU contains two mode is 4 𝜇 S. 10 kbyte/s is the data
processes when one process is being transfer rate.
executed on the CPU the other one is Net transfer done = 25 × 103
swapped out arid all the data is Whereas actual transfer rate takes
saved on the disk, and when the place at rate =10 4
other one is in progress then all the Therefore, the minimum
data of first process is saved on the performance gain =25
disk, Thus the swap space is
basically Used for saving the process Q.5 (b)
data. As given
Q.2 (a) Revolution per minute
The total size of control memory = 3000 rpm
processor's instruction is 26 bit = 50 revolutions per second (rps)
which is divided into three equal 512 kbyte of data can be read in one
parts of 13 bit each of micro- revolution. Number of tracks that
operation. And MUX has input of 8 can be read = 217
status bit. And in one second number of tracks
So, V, the select line field size is of 3 read =217 * 50
bit and the next address field size, X Time taken by each interrupt= 4 ns
becomes of 10 (13 - 3) bit. The size Thus, the interrupt, 6553600 takes =
of control memory obtained = 2^10 0.2621s
= 1024 Therefore, minimum percentage
gain = 0.2612/1 = 26%
Q.3 (a) Thus, the answer is 25.
The size of transfer = 20 kbyte (10 ×
2 ^ 10 kbyte) Q.6 (a)
Transfer rate of data = 10 Mbyte/s The formula used is
Therefore, Total disk size is given by = Number
10 *2 ^ 10 x% = 20 of surfaces x Number of tracks x
x = 20*100/10*210 Number of sectors x Capacity of
= 200/1024 each sector.
= 0.1% Therefore from the given data, we get
Total disk size =16 x 128 x 256 x
Q.4 (a) 512 byte
= 28 x 220 = 28 megabyte
= 256 MB
98
Total number of sectors 
= 16 x 128 x 256 byte 60s  10000rotations 
= 219 byte 
 Rotation Tim
60 
Q.7 (c)  6ms  1rotation 
The seek latency is not linearly 10000 
proportional to seek distance due to 1
∴ Rotational latency =  6ms  3ms
the higher capacity of tracks on the 2
periphery of the latter. The higher 1track → 600sectors
capacity of the tracks is responsible 6ms ←600 sectors (1 rotation
for the presence of the desired cell means 600 sectors (or) 1 track)
in the wrong part and because of 6ms
this certain amount of time is 1sector →  0.01ms
600
required for this cell to reach the
2000sector → 2000(0.01) = 20ms
read-write head sp that data
∴ total time needed to read the
transfer can take place. entire file is
= 2000 (4+3) +20
Q.8 (c)
=8000+6000+20 = 14020 ms
We have to find the sector number
of the address <400,16,29 > .
Q.12 (6.1)
Therefore, 400 *2* 10* 63 + 16* 63 60sec →15000 rotations
+ 29 = 505037 sector
60
 4ms  1rotation
Q.9 (c) 15000
<0, 16, 13> this address corresponds Average rotational delay =
to a sector number which is given by 1
 4  2ms
16 * 63 + 31 = 1039 2
As per question, average seek time =
Q.10 (d) 2 × Avg. rotational delay
42797 1024 = 2×2 = 4ms
42797 KB = = 85594 
512 1sec  50 106 bytes

sectors 512  disk transfer time
0.01ms   512bytes 
Starting is (1200, 9, 40) contains 50 106 
total 24 + (6x64) = 408 sectors As per Question, controller’s
Next, 1201, -----, 1283 cylinders transfer time is =10×0.0 ms = 0.1 ms
contains total 1024 x 83 = 84992 Avg. Time = 4ms + 0.1 ms + 2 ms =
sectors 6.1 ms
(∵ each cylinder contains 16 x 64
=1024 sectors) Q.13 (456)
∴ Total=408+84992 = 85400 sectors 29154kB
∴ The required cylinder number is DMA controller needs ⇒
216 byte
(1284) which will contain the last
⇒ 455.53125 = 456
sector of the file
Q.11 14020
Given
Seek time = 4ms
60s →10000 rotations
99
ASSIGNMENT QUESTIONS
Q.1 From a given tautology, another Q.8 If X, Y and Z are 3 Boolean variable
tautology can be delivered by then X ( Y + Z) equals ( X + Y) ( X +
interchanging Z), if X, Y, Z take the values
a) 0 and 1 a) 1, 0, 0 b) 0, 1, 0
b) AND and OR c) 1, 1, 0 d) 0, 1, 1
c) 0 and 1; AND and OR
d) imposable to always derive Q.9 Which of the following comments
about the program Counter (PC) are
Q.2 Which of the following logical true?
operation produce a 0 if the inputs a) It is a register.
are 1, 1and 0? b) It is a cell in ROM.
a) OR c) During execution of the current
b) AND instruction, its content changes.
c) Exclusive-OR d) None of the above
d) Exclusive-NOR
Q.10 If (123)s = (A3)B’ then the number
Q.3 Choose the correct answer. of possible value of A is
If × is a Boolean variable, then a) 4 b) 1
a) 0 + x = x b) 1 ÷ x = x c) 3 d) 2
c) x + x = x d) x + x’ = 0
Q.11 The speed imbalance between
Q.4) If X, Y and Z are three Boolean memory access and CPU operation
variables then can be reduced by
a) X. X’ = 1 a) cache memory
b) X (Y + Z) = (X + Y) (X ÷ Z) b) memory interleaving
c) X + XZ = X c) reducing the size of memory
d) X + Y = Y + X d) none of the above
Q.12 If (12A)3 = (123)A’ then the value of
Q.5 Which of the following codes needs A is
7 bits to represent a character? a) 3 b) 3 or 4
a) ASCII b) BCD c) 2 d) none of the above
c) EBCDIC d) GRAY
Q.13 Choose the correct statement.
Q.6 Which of the following the are not a) By scanning a bit pattern, one can
weighted codes? say whether, it represents data
a) Roman number system or not.
b) Decimal number system b) Whether a given piece of
c) Excess 3-code information is a data or not
d) Binary number system depends on the particular
application
Q.7 The minimum time delay between c) Positive numbers can’t be
the initiations of two independent represented in 2’s compliments
memory operations is called form.
a) access time b) cycle time d) Positive numbers can’t be
c) transfer rate d) latency time represented in 1’s compliments
form.
100
Q.14 Which of the following does not a) It produces product of sum as the
need extra hardware for DRAM output.
refreshing? b) It produces sum of products as
a) 8085 b) Motorola-6800 the output.
b) Z-80 d) None of the above c) It is dedicated for a particular
operation.
Q.15 The advantage of MOS devices over d) It is general.
bipolar devices is
a) it allows higher bit densities and Q.22 Any given truth table can be
also cost effective represented by a
b) it is easy to fabricate a) Karnaugh map
c) its higher-impedance b) sum of product of Boolean
d) its operational speed expressions
c) product of sum of Boolean
Q.16 The boolean expression X + X’ Y expressions
equals d) none of the above
a) X + Y b) X + XY
c) Y + YX d) X’ Y + Y’ X Q.23 A number system uses 20 as the
radix. The excess code that is
Q.17 ( X + Y) + Z = X ( Y + Z) necessary for its equivalent binary
a) shows that the Boolean operator coded representation is
OR is distributive a) 4 b) 5
b) shows that the Boolean operator c) 6 d) 7
OR is associative
c) implies the associativity of the Q.24 Choose the correct statement.
Boolean operator AND a) Bus in a group of information
d) None of the above carrying wires.
Q.18 Which of the following are registers? b) Bus is needed to achieve
a) Accumulator reasonable speed of operation.
b) Stack pointer c) Bus can carry data or address.
c) Program counter d) A bus can be shared by more
d) Buffer than one device.
Q.19 Which of the following remarks Q.25 A+B can be implemented by

about BCD are true? a) NAND gates alone
a) It is a 8-4-2-1 weighted code b) NOR gates alone
b) Compliment of a number can be c) AND gates alone
found efficiently d) None of the above
c) (12345678)10 needs 4 bytes in Q.26 Bipolar devices are desirable in the
BCD representation fabrication of which of the following
d) Conversion to and from the components?
decimal system can be done a) Main memory
easily b) Cache memory
Q.20 The first operating system used in c) Micro program memory
microprocessors is d) all of the above
a) Zenix b) DOS Q.27 Which of the following is the
c) CP/M d) Multics programmable internal timer?
Q.21 Which of the following remarks a) 8251 b) 8250
about PLA is/are true? c) 8253 d) 8275
101
Q.28 The idea of cache memory is based Q.34 The minimum number of gates
on the required to implement and Boolean
a) property of locality of reference expression AB + AB’+ A’C
b) fact that only a small portion of a a) 1 AND gate and 1 OR gate
program is referenced relatively b) 2 NAND gates
frequently c) 3 AND gates and 2 OR gates
c) heuristic 90-10 rule d) none of the above
d) fact that references generally
tend to cluster Q.35 Property of locality of reference may
fail if a program has
Q.29 Which of the following weights a) many conditional jumps
makes the complement operation b) many unconditional jumps
easier in BCD form? c) many operands
a) 8-4-2-1 b) Excess-3 d) none of the above
c) 2-4-2-1 d) 3-2-1-0
Q.36 Which of the following comments
Q.30 The sequence of events that happen about half adder are true?
during a typical fetch operation is a) It adds 2 bits.
a) PC → Mar → Memory → MDR → IR b) It is called so because a full
b) PC → Memory → MDR →IR adder involves two-adders.
c) PC → Memory → IR c) It does half the work of a full-
d) PC → MAR→ Memory → IR adder.
d) It needs two input and generate
Q.31 Any given Boolean expression can two output.
be implemented by using
a) Only NAND gates Q.37 The binary equivalent of the decimal
b) Only NOR gates number 0.4375 is
c) Only OR gates a) 0.0111 b) 0.1011
d) Only AND gates c) 0.1100 d) 0.1010
Q.32 To get Boolean expression in the Q.38 The Boolean expression (A + C)(AB’
product of sum form, from a given + AC)(A’C’ + B’) can be simplified to
Karnaugh map a) AB b) AB + A’C
a) don’t care conditions should not c) A’B + BC d) AB +BC
be present
b) don’t care conditions, if present, Q.39 A byte addressable computer has a
should not taken as zeroes memory capacity of 2m Kbytes and
c) one should cover all the 0’s can perform 2n operations. An
present and complement the instruction involving 3 operands
and one operator need a maximum of
resulting expression
a) 3m bits b) 3m + n bits
d) one should cover all the 1’s
c) m +n bits d) none of the above
present and complement the
resulting expression Q.40 In the previous problem, if the
computer is word addressable with
Q.33 The Boolean expression AB + AB’ + the word size being 8 bytes then the
A’C + AC is unaffected by the value answer will be
of the Boolean variable a) 3m bits b) 3m + n bits
a) A b) B c) m +n bits d) none of the above
c) C d) none of the above
102
Q.41 The number of columns in a sate a) Shift register
table for a sequential circuit with ‘m’ b) Mod-3 counter
flip-flops and ‘n’ input is c) Mod-2 counter
a) m + n b) m + 2n d) none of the above
c) 2m + n d) 2m + 2n
Q.48 Negative number cannot be
Q.42 A computer uses ternary system represented in
instanced of the traditional binary a) Signed magnitude form
system. An ‘n’ bit string in the binary b) b) 1’s complement form
system will occupy c) 2’s complement form
a) 3 + n ternary digits d) none of the above
b) 2n/3 ternary digits
c) n(log2 3) ternary digits Q.49 The addressing mode used in the
d) n(log3 2) ternary digits instruction of the form ADD X Y, is
a) absolute b) immediate
Q.43 The Boolean expression A’BE + c) indirect d) index
BCDE + BC’ D’E + A’B’DE’ + B’C’DE’
can be simplified to BE + B’DE’, If Q.50 The combinational circuit in fig.
the don’t care conditions are below can be replaced by a single
a) ABCDE + AB’CDE’
b) ABCDE + AB’CDE’ + ABCD’E
c) ABC’DE + AB’CDE + ABCD’E a) OR gate b) XOR gate
d) none of the above c) NOR gate d) AND gate
Q.44 The decimal equivalent to the binary Q.51 (10110011100011110000)2 in base

number 101.101 is 32 is
a) 5.6249 b) 5.625 a) 22 14 7 16 b) 11 9 23 31
c) 5.5 d) 5.25 c) 11 9 7 16 d) 11 14 23 16
Q.45 Which of the following does not Q.52 The XOR operator  is
have 8 data lines? a) commutative
a) 8085 b) 8086 b) associative
c) 8088 d) Z-80 c) distributive over AND operator
d) none of the above
Q.46 Which of the following logic families
is well suited for high speed Q.53 Bubble memorize are preferable to
operation? floppy disk because
a) TTL b) ECL a) of them higher transfer rate
c) MOS d) CMOS b) the cost needed to store a bit is
less
Q.47 The following arrangement of the JK c) they consume less power
flip-flops does the function of a d) of their reliability
Q.54 Addressing capability of 8086/88 is

a) 64 K b) 512 K
c) 2 MB d) 1 MB
Q.55 The following circuit produce the

output sequence
103
c) Von Neumann
d) All of the above
Q.62 How many 2-input multiplexers

required to construct a 210 – input
multiplexer?
a) 1023 b) 31
a) 1111 1111 0000 0000 c) 10 d) 127
b) 1111 0000 1111 000
c) 1111 0001 0011 010 Q.63 Let A be a set having ‘n’ elements.
d) 1010 1010 1010 1010 The number of binary operations
that can be defined on A is
Q.56 Which of the following units can be a) n n
2
b) 2n
n
used to measure the speed of a n n
computer? c) n 2 d) 22
a) SYPS b) MIPS
c) BAUD d) FLOPS Q.64 The value of x and y, if (x567)8 +
(2yx5)8 = (71yx)8 is
Q.57 If A  B = C (  stands for the XOR a) 4, 3 b) 3, 3
operator), then c) 4, 4 d) 4, 5
a) A  B = B
b) B  C = C Q.65 The number of instruction needed
c) A  B  C = 0 to add ‘n’ numbers and store the
result in memory using only one
d) none of above
address instruction is
a) n b) 60
Q.58 Which of the following operations
c) 70 d) 75
(s) is/are not closed as regards to
computers? Q.66 The number of instruction needed
a) Addition b) Subtraction to add ‘n’ numbers and store the
c) Multiplication d) Division result in memory using only one
address instruction is
Q.59 IF (11A1B)8 = (12c9)16 (c stands for a) n
decimal 12), then the values of A b) n -1
and B are c) n + 1
a) 5, 1 b) 7, 5 d) independent of n
c) 5, 7 d) none of the above
Q.67 The Boolean expression
Q.60 The total number of possible corresponding to the circuit in
Boolean functions involving ‘n’ figure below is
Boolean variables is
a) infinitely many
b) nn
c) n2
Q.61 Which of the following architecture

a) a tautology
is/r not suitable for realizing SIMD?
b)inconsistency
a) Vector processor
c) independent of A
b) Array processor
104
Q.68 The clock of a microprocessor can In order to make it a tautology the ‘?’
be divided by 5 using a marked box should be replaced by
a) 3 bit counter b) 5 bit counter a) an OR gate b) an AND gate
c) mod 5 counter d)mod3counter c) a NAND gate d) a NOR gate
Q.69 The minimum cover for the Q.76 In the cache needs an access time of
maximum compatibility classes {ae, 20 ns and the main memory 120 ns,
acd, ad, bd} then the average access time of a
a) ae, acd , ad b) acd, ad , bd CPU is (assume hit-ratio is 80%)
c) ae, acd , bd d) ae, ad, bd a) 30 ns b) 40 ns
c) 35 ns d) 45 ns
Q.70 The values of a, x, y if 47 × 80 is the
10’s complement yaya0 are Q.77 The number of clock cycles
a) 4, 3, 2 b) 5, 4, 4 necessary to complete 1 fetch cycle
c) 3, 4, 5 d) 2, 4, 5 in 8085 (excluding wais state) is
a) 3 or 4 b) 4 or 5
Q.71 The reasons for the presence of ALE c) 4 or 6 d) 3 or 5
pin in 8085, but not in 6800 is that
a) 8085 uses I/O mapped I/O Q.78 The seek time of a disk is 30 ms. It
b) 876 ms rotates at the rate of 30 rotations
c) 850 ms per second. Each track has a
d) 900 ms capacity of 300 words. The access
time is approximately
Q.72 If memory access takes 20 ns with a) 47 ms b) 50 ms
cache and 110 ns without it , then c) 60 ms d) 62 ms
the hit-ratio, (cache uses a 10 ns
memory) is Q.79 Motorola’s 68040 is comparable to
a) 93 % b) 90 % a) 8085 b) 80286
c) 87 % d) 88 % c) 80386 d) 80486
Q.73 In which of the following Q.80 The possible number of Boolean

instructions bus idle situation functions of 3 variables X, Y and Z
occurs? such that f(X, Y, Z) = f(X’ Y’ Z’) is
a) EI b) DAD rp a) 8 b) 16
c) INX H d) DAA c) 64 d) 32
Q.74 Any instruction should have at least Q.81 Which of the following interrupt is
a) 2 operands both level and edge sensitive?
b) 1 operands a) RST 5.5 b) INTR
c) 3 operands c) RST 7.5 d) TRAP
Q.82 The difference between 80486 and
Q.75 Consider the circuit in Fig. below. 80386 is/are
a) presence of floating point co-
processor
b) speed of operation
c) presence of 8 k cache on chip
d) presence of memory controller
105
Q.83 The addressing mode used in the d) all of the above
instruction PUSH B is
a) direct b) register Q.89 The number of possible Boolean
c) register indirect d) immediate functions that can be defined for n
Boolean variables over n-valued
Q.84 The most relevant addressing mode Boolean algebra is
to write position independent code a) 22
n
b) 2n
2
is n n
c) n 2 d) n n
a) direct mode
b) indirect mode Q.90 The ASCII code 56, represents the
c) relative mode character
d) indexed mode a) V b) 8
c) a d) carriage return
Q.85 Which of the following are CISC
machines? Q.91 Parallel printer uses
a) IBM 360 b) 80386 a) RS-232C interface
c) 68030 d) none of the above b) centronics interface
c) hand-shake mode
Q.86 Which of the following rules d) synchronous data transfer mode
regarding the addition of 2 given
number is correct? If negative Q.92 A micro programmed control unit
numbers are represented in 2’s a) is faster than a hard-wired
complement form? control unit
a) Add sign bit and discard carry, if b) facilitates easy implementation
any of new instruction
b) Add sign bit and add carry, if any c) is useful when very small
c) Don’t add sign bit and discard programs are to be run
carry, if any d) usually refers to the control of a
d) Don’t add sign bit and add carry, microprocessor
if any
Q.93 Which of the following are typical
Q.87 When INTR is encountered, the characteristics of a RISC machine?
processor branches to the memory a) Instruction taking multiple cycles
location, which is b) Highly pipelined
a) 0024H c) Instruction interpreted by micro
b) determined by the ‘call address’ programs
instruction issued by the I/O d) multiple register sets
device
c) determined by the ‘RST n’ Q.94 The working of a staircase switch is
instruction issued by the I/O typical example of the logical
devic operation
d) all of the above a) OR b) NOR
c) Exclusive-OR d) Exclusive-NOR
Q.88 The advantage of a single bus over a
multi-bus is the Q.95 The exponent of a floating-point
a) Low cost number is represented in excess-N
b) flexibility in attaching peripheral code so that
devices a) the dynamic range is large
c) high operating speed b) the precision is high
106
c) the smallest number is a) A = B = C = 1 b) B=C=1;A = 0
represented by all zeros c) A = C = 1; B = 0 d) A= B=1;C= 0
d) overflow is avoided
Q.98 In serial communication, an extra
Q.96 On receiving in interrupt from an clock is needed
I/O device, the CPU a) to synchronize the devices
a) halts for a predetermined time b) for program baud rate control
b) hands over control of address c) to make efficient use of RS-232
bus and data bus to the d) none of the above
interrupting device Q.99 In negative numbers are stored in
c) branches off to the interrupt 2’s complement form, the range of
service routine immediately numbers that can be stored in 8 bits
d) branches off to the interrupt is
service routine completion of the a) - 128 to + 128 b)-128 to+127
current instruction. c) - 127 to + 128 d)-127to + 127
Q.97 The Karnaugh map for the Boolean
function F of 4 Boolean variables is Q.100 If SUB A,B means B ~ A, Then SUB
given in Fig below. A, B, C are don’t 4(R0), *5(R1) means ( (X) means
care conditions. What values of A, B, contents of register of memory
C will result in the minimum location X)
expression? a) (((R1)+5)) – (4 * (R0) ))
b) (( (R1)+5)) – ( (R0)+4))
c) (( R1)+5) – (4 * (R0) )
d) (( R1)+4) – (R0 +4 ))
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(c) (b) (a) (c) (a) (a) (b) (b) (a) (b) (a) (d) (b) (c)
15 16 17 18 19 20 21 22 23 24 25 26 27 28
(a) (a) (b) (a) (a) (c ) (b) (a) (c) (a) (a) (b) (c) (a)
29 30 31 32 33 34 35 36 37 38 39 40 41 42
(c) (a) (a) (c) (b) (d) (a) (a) (a) (a) (d) (d) (c) (d)
43 44 45 46 47 48 49 50 51 52 53 54 55 56
(c ) (b) (b) (b) (b) (d) (a) (d) (a) (a) (c) (d) (c) (b)
57 58 59 60 61 62 63 64 65 66 67 68 69 70
(a) (a) (d) (d) (c) (a) (a) (a) (d) (c ) (a) (c) (c) (d)
71 72 73 74 75 76 77 78 79 80 81 82 83 84
(c) (b) (b) (d) (c) (b) (c) (a) (d) (b) (d) (a) (c) (c)
85 86 87 88 89 90 91 92 93 94 95 96 97 98
(a) (a) (b,c) (a) (d) (b) (b) (b) (b) (c ) (c ) (d) (d) (b)
99 100
(b) (b)
107
EXPLANATIONS
Q. 2 (b) to it. A word pointed to by the
Exclusive OR takes the value 0 if programme counter, is an
there Are even numbers of 1’s instruction. Otherwise it need not
be. Also, the word data has context
Q. 3 (a) sensitive meaning. One can write a
X can take the value of either 1 or 0. programme in Pascal that needs
Substitute and verify the identities. radius as the input data. The
programme, as a whole, is input
Q. 4 (c) data for the compiler during the
Form truth table and check the compilation process.
correctness of the option (c) and (d)
Q. 16 (a)
Q.8 (b) X + X’Y = X . 1 + X’Y
Substitute and verify each of the = X (1 + Y)+ X’Y
possibilities. = X . 1 + XY + X’Y
= X + (X + X’) Y
Q. 9 (a) =X+1.Y=X+Y
During execution of the current If that sounds quite
instruction the content is unnatural, Here is another way. Let
incremented so that it points to the K = X + X’Y (we have to find K)
next instruction. Complementing both sides K’
= (X + X’Y)’ = X’ (X + Y’)
Q. 10 (b) = X’ X + X’Y’ = 1 + X’Y’
Converting to decimal form, the = X’Y’
given equation is Again complementing both sides K =
3 + ( 2 × 5) + (1 × 5 × 5) (X’Y’)’ = X + Y.
= 3 + A × B i.e., 38 = A × B + 3. Hence the answer is (a).
So, A × B = 35.
Q. 17 (b)
Possible values for A, B are 1, 35; 5,
Obviously it shows it is associative.
7; 7, 5; 35, 17, 5 and 35 are
It implies (by the law of duality), the
infeasible, as permissible digits for a
associativity of AND also,
number in base ‘r’ are 0, 1, 2, ....(r -
complementing both sides,(X + (Y +
1). Hence 1 and 5 are the possible
Z))’ = ((X + Y) + Z)’
values of A.
X’(Y’Z’) = (X’Y’)Z’(By De Morgan’s
law)
Q. 12 (d)
Refer Qn. 10. Converting to decimal
Q. 22 (a)
form, A + 2 × 3 + 1 × 3 × 3 = 3 + 2 × A
Karnaugh map is just pictorial
+ 1 × A × A. Solving for A, we get A =
representation of the truth table. By
- 4 or 3. Both are infeasible.
covering the 1’s, We get the sum of
Q. 13 (b) product form. By covering the 0’s
The contents of a word may and then complimenting, We get the
represent an instruction or data. product of sum form
Just by looking at the contents, it is
not possible to attach any meaning Q. 23 (c)
108
Consider the decimal digit 5. Its BCD 2 -1. This way 1010, will be decimal
representation is 0101. If 4.
complimented, we 1010, i.e., 15 – 5.
Ingeneral, complimenting x gives 15 Q. 31 (a)
– x.But correct complimented value NOR and NAND are universal by
should be 9 – x. The difference of 6 NOR as follows.
can be nullified by going for excess – NAND can be simulated by NOR as
3 code. (3 because using it twice, i.e., follows.
during the conversion and NAND (A, B) = A’ + B’
reconversion process one can NOR (A, A) = A’
account for the excess 6). If a NOR (B, B) = B’
number system uses 20 as the radix, NOR (A’, B’) = (A’, + B’)
Each digits needs 5 bits in the = AB
equivalent BCD form. So, compliment NOR (AB, AB ) = (AB)’
of x, gives 31 – x. But the current = A’ + B’ = NAND (A, B)
value in 19 – x. To account for the So, it suffuse to prove NAND is a
xcess 31 – 19, i.e., 12 we have to use universal gete.
excess – 12 code. i.e., Take 11 to stop If that is true, it should be simulate
its compliment should be 90 -11 = 8. any Boolean operator. Since
In excess – 6code, we add 6 to 11, to operation are OR, refer Qn. 25 to see
get 17. Complementing, we get 31- how OR can be simulated.
17 = 14. If we subtract the excess It is simple to simulate
6,we get 14 – 6 = 8, which is the complementation.
required answer. NAND (A, A) = A
AND can be simulated as follows.
Q. 24 (a) NAND (A, B) = (AB)’
NAND ((AB)’, (AB)’) = AB
Q. 25 (a) Hence the correct answers are (a)
By NAND gate as follows. and (b).
Q.32 (c)
Don’t care condition need or need
not be present. If present, they need
By NOR gate as follows.
or need not be used. If they aid in
the simplification process, we them
to our advantage. Otherwise they
are literally don’t care.
Q. 28 (a)
90 – 10 is a heuristic rule that says
Q.33 (b)
90 % of the execution time is spent
AB + AB’ + AC’ + AC
on 10 % of the code.
= A (B + B’) + (A’ + A)C
= A (1) + (1) C = A + C, which is
Q. 29 (c)
independent of B.
Consider the decimal digit 5. Its BCD
form is 0101. Complementing, we
Q.34 (c)
get 1010, which is decimal 10. To
The given expression is AB + AB’ +
make 1010 correspond to decimal 4
A’C = A (B + B’) + A’C = A(1) + A’C
(which is the correct complement of
= A + A’C = A + C ( Refer Qn. 16)
5), we can assign the weights 2 - 4 -
109
So, one needs just a single OR gate to karnaugh map as above. The 1’s
implement the given Boolean can be covered in the optimal way,
expression. if the slots marked X are set to 1’s.
So the three X’s in the positions
Q.38 (a) ABCD’E, ABC’DE, AB’CDE’ are the
(A + C) (AB’ + AC) don’t care conditions to be set to 1
= AAB’ + AAC + ACB’ + CAC and used. Hence the answer is (c).
= AB’ + AC + CAB’ + AC
(Since X = X) Q.50 (d)
= AB’ + CAB’ + AC Te circuit is (A’B’)’ = A . B.
(Since X + X =X)
So the given Boolean expression is Q.51 (a)
(AB’ + CAB’ + AC)(A’C’ + B’) TO convert to best 8, we group in
=AB’AC’+AB’B’ + CAB’A’C’ +CAB’B’ + 3’s, because 23 = 8.
ACA’C’ + ACB’ TO convert to best 16, we group in
= 0 + AB’ + 0 + CAB ‘+ 0 +ACB’ 4’s, because 24 = 16.
= AB’ + ACB’ = AB’ (1 + C) TO convert to best 32, we group in
= AB’ 5’s, because 25 = 32.
Grouping in 5’s, from the right, we
Q.39 (d) can get the answer.
To specify a particular operation,
out of the 2n possible operations, Q.52 (a)
one needs n bits. As the machine is It is commutative because A  B =
byte addressable, to specify a B  A It is associative because
particular byte addressable, to (A  B)  C = A  (B  C).
specify a particular byte we need (m It is not distributive over AND
+ 10) + n = 3 m + n + 30 bits. because
A  ( B AND C) = (A  B ) AND
Q.40 (d) (A  C)Is not true. For e.g.,
Refer Qn. 39. 1  (0 AND 1) = 1
If it is word addressable, then the But (1  0) AND (1  1) = 1
number of word is 2(m + 10) AND 0 = 0
divided by 23,i.e., 2m + 7 words. Q.57 (a)
So, one needs 3 (m + 7) + n X + Y = 0 (Construct the truth table
= 3 m + n + 21 bits. and verify)
So, A  B = C
Q.41 (c)
 A  (A + B) = A  C
It is 2 m + n. ‘n’ columns for the ‘n’
 (A  A)  B = A  C
inputs; 2m columns for storing the
0  B=A  C
‘m’ present states and ‘m’ next
B=A  C
states.
Similarly, (b) and (c) can be proved.
Q.59 (d)
Converting to base 2, the equation
reads 001 001 A 001 B = 0001 0010
1100 1001 Here A, B stand for a
The terms A’BE corresponds to A‘ –
group of binary digits. So, grouping
0; B – 1; E – 1; C – 0 or 1; D – 0 or 1.
the right hand side in 3’s, from the
Similarly make all 1’s and get the
right and matching corresponding
110
groups in both the sides, we get B = 2) number of bits. So binary
001 and A = 011 So, A=3 and B=1. representation of M needs more
than log 1024 bits. But less than log
Q.60 (d) 1025 bits. Log 1024 is greater than log
A single Boolean variable can take 824 (= log 272 = 72 log 2 = 72 ). So,
the value either 0 or 1, i.e., 2 more than 72 bits are needed. The
possible ways. So, ‘n’ Boolean nearest answer is 75.
variables can take 2 × 2 × 2... (n
times) values , i.e., 2n times. So, the Q.66 (c)
truth table will have 2n rows. Each A typical one address instruction
row can be assigned one of the 2 uses that address to specify one
n
values 0 or 1. So, totally 22 operand, the other operand will be
functions are possible. So, none of in the accumulator by default. So, to
the given add n given numbers, a1, a2,...an,
first transfer a1 to accumulator.
Q.62 (a)
A 2-input multiplexer can select a Next the instruction ADD a2 – adds
single line out of the two-input lines. the content of a2 to the accumulator
and leaves the sum there.
To select a single line out of the 210,
Continuing this way, we need n
i.e., 1024 input lines, we have to use
instruction to add n numbers and
1023 two-input multiplexers. In
order to select 512 lines out of the place the result in the accumulator.
1024, we need 512 two-input Finally, to store the
result in memory 1 more instruction
multiplexers. Continuing this way, to
is needed. So, (n + 1) instructions
ultimately get a single line, we need
a total of 512 + 256 + 128 + 64 + 32 are needed.
+ 16 + 8 + 4 + 2 + 1 = 1023 two –
Q.67 (a)
input multiplexers.
The input to the NAND gate is (A +
Q.63 (a) B’)’ and (A’ + B)’, i.e., A’B and AB’.
By definition a binary operator So, the output of NAND gate will be
defined on a set A is a function F : A (A’BAB’) = 0’ = 1. So F is always 1.
× A → A. The domain, i.e., A X A has Hence it is a tautology.
n × n elements (because A has n
elements). Each of these n2 elements Q.69 (c).
can be mapped to one of the ‘n’ Let us denote the given classes in a
2
element of A. So, totally n n binary tabular form as follows:
operation are possible.
Q.64 (a)
Add 7 and 5. It yields 4 and carries 1
(since it is an octal addition). So, x is
4. Similar reasoning, after
substituting x = 4, yields y = 3.
The ‘e’ column has only one 1. That
Q.65 (d) corresponds to ae. So, ae has to be
The number M will be such that 1024 present in the minimal cover.
 M < 1025 Analyzing the ‘c’ and ‘b’ column. We
A decimal number y, needs find acd and bd have to be included.
approximately log x (log is to base Hence the answer.
111
Q.70 (d) Q.80 (b)
From the definition of 10‘s Totally 256 functions are possible
complement, (refer Qn. 60). We have to find how
105 = 47 × 80 + yaya0 many of these 256 functions satisfy
So, a is 2 ; y is 5; x is 4. the conditions. f(X, Y, Z) = f(X’, Y’ , Z’)
e.g., If f(0, 1, 1) = 0 then f(1, 0, 0) has
Q.72 (b) to be 0.
Let m be the hit-ratio. Then 20 = 10 This constraint makes only half of
× m + (1 - m) × 110. the truth table, i.e., 4 rows to take
Solving we get m = 0.9, i.e., 90 % independent values. So we have 24 =
16 possible functions.
Q.74 (d)
Operations on stack need no
operands address as only the top of Q.89 (d)
stack can be accessed. Top of stack There are ‘n’ Boolean variables.
will be stored in a register, Each can take one of the n possible
dedicated for this purpose. Only the values 0, 1, 2,....., n-1. So, the truth
operation (e.g. POP) needs to be table will have nn rows. Now each
specified. row can take one of the ’n’ values as
the output values. So, The possible
Q.75 (c, d) no of functions are n × n × n ...(nn
To make F, a tautology, its value has times), i.e., n n
n
to be 1 for all possible inputs. One

input is ( X + Y). If the other input is Q.94 (c)
X’ or Y’ (X + Y)’ then F = 1, always. If Exclusive-OR takes the value 1 of an
it is a NAND gate we have X + Y + X’ odd number of 1’s in the input and
+ Y’ = 1, always. 0, otherwise. Consider the working
If it is NOR gate we have of a staircase switch. It can be
X + Y + X’Y’ = 1, always operated by two switches, say A and
B. If A= 0 and B = 0, the light will be
Q.76 (b) off(i.e., 0)
Average access time we (.8 × 20) + If A = 1 and B = 0, the light will be off
(1 – 0.8) × 120 = 40 ns. (i.e., 1)
If A = 0 and B = 1, the light will be off
Q.78 (a) (i.e., 1)
Access time is seek time plus latency If A = 1 and B = 1, the light will be
time. Seek time is the time taken by off(i.e., 0)
the read – write had (i.e., RWH ) to The truth table of Exclusive-OR
get into the right track. Latency time operation is the very same.
is the time taken by the RWH, to
position itself in the right section. Q.97 (d )
So, that actual transfer can take If A = B = 1; C = 0 we can cover all
place. Here a track has 300 words. 1’s in the best possible way as below
So, on an average to position at right given in the diagram
word RWH should traverse 150
words. Time taken for this will be
150/(30 × 300) sec. = 17 ms
(approximately). So, the access time
will be 30 + 17 = 47 ms.
112

Computer Organization and Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Organization and Architecture

Uploaded by

Copyright:

Available Formats

COMPUTER ORGANIZATION

ANALYSIS OF GATE PAPERS

2.1 Introduction cycle 13

4. INPUT AND OUTPUT UNIT

4.1 I/O Mapping / Addressing Methods 44

5. MULTIPLE PROCESSOR ORGANISATION

5.1 Flynn’s Classification of Computer Organization 60

7. ASSIGNMENT QUESTIONS 100

1.1 INTRODUCTION program and data. There are two

When the result of an arithmetic

Note : For a negative multiplier, a solution

1.7 BOOTH’S ALGORITHM

1.8 INTEGER DIVISION

Decimal division and the binary-coded

Do the following n times:

means the actual exponent, E, is in the provides a precision equivalent to

the binary point. The following figure provides a precision equivalent to

 If a number is not normalized, then put

Since, the scale factor is in the form 2i,

Note: Round to nearest is the default

2.1 INTRODUCTION CYCLE In addition to these registers, processor

2.1.1 Sequence of Operation in

 The processor contains two registers

2.2 ADDRESSING MODES Description :

 In this, the op code specifies a register INSTRUCTION FORMATS

Typical Instruction Formats

1. Three address instructions

An initiation signal called START (t1) is

2.6 WILKES DESIGN

The micro programmed control unit was

2.7 HORIZONTAL AND VERTICAL

 The following figure shows a logic

3.1 INTRODUCTION computer system can be divided into three

 To achieve the goal of making C

 Random access memory (RAMs) are

3.4.3 READ-ONLY MEMORIES

4. EEPROM  Memory words are stored in the “cache

3.4.5 Types of Cache Thus, in cache memory, each time the

 Write policies: During write operation,

Address length (s + w) bits

3.4.6.2 Associative Mapping

 The associative mapping technique is  In this, the hardware cost is reduced by

3.4.8 Example of Low order and High

In an interleaved memory system, the CPU

3.6 Advantages of using Virtual Memory

 Virtual memory is the separation of

 In figure swapping of a paged memory

3.8.2 Magnetic Disks  To specify a transfer, the program must

3.8.3 Magnetic Drums

 A variation on the disk is the drum, a

4.1 I/O MAPPING/ ADDRESSING  Memory area proportionately reduced

 The memory and I/O address space are

4.1.3 INTERRUPT BASED I/O SEND :

4.2 IOP (IO PROCESSOR)

Once interfaced to the main CPU, the IO

 IOP control unit contains two control

The flow chart is as shown below:

4.4.2 BURST/DEMAND DMA 4.4.4 PROGRAMMED I/O TRANSFER

4.5.1 DMA TRANSFER MODE

 A device called multiplexer is a small

4.11.3 LONG DISTANCE COMMUNICATION

 Many computer systems have been

Vector address given by the I/O device