Professional Documents
Culture Documents
COA - Bindu Agarwalla Notes
COA - Bindu Agarwalla Notes
COA - Bindu Agarwalla Notes
Syllabus
CS 2006 COMPUTER ORGANIZATION AND ARCHITECTURE Cr- 4
Activities: 30 Marks
Course Handout, Computer Architecture, CS-2006, Sec: CSSE-2
Faculty Name:Bindu Agarwalla
Mid Semester Exam: 20 Marks
End Semester Exam: 50 Marks
__________
Total 100 Marks
Computer Organization and
Architecture(CS 2006)
Bindu Agarwalla
1
COA
1. Computer??
2. Organization??
3. Architecture??
A Computer is an electronic device that accepts input from the outside world and
processes them according to some predefined instructions and produces output for the
outside world.
Computer
1. Input Devices
2. Memory
3.Processor
4. Output Devices
Memory
2. Memory
Primary Memory: RAM and ROM
Meaning of Memory capacity
bit
byte
nibble
word
Secondary memory
One instruction requires 7 clock cycle to complete its execution. How much time
is required for that instruction if the processor speed is 5 GHz?
Registers
1. Dedicated:
PC
MAR
MDR
IR
SP
MDR: Memory Data Register. It contains the data that is being read from
memory/ the data that is being written into memory during a write operation.
4. [MDR]→ IR
5. Decode the instruction.
4. [MDR]→ IR
5. Decode the instruction.
2 times.
1. To fetch the instruction.
2. To fetch the operand from memory location A
Basic Operational Concept
Example1 : ADD R1, A
Steps:
1. [PC]→ MAR
2. Generate the 'Read' Signal.
4. [MDR]→ IR
5. Decode the instruction.
3 times.
1. To fetch the instruction.
2. To fetch the operand from memory location A
3. To store the result into memory location A
A processor has 48-bit instructions composed of two fields: the first two bytes
contain the opcode and the remainder a memory operand address.How many bits
are needed for the Program Counter and the Instruction Register ?
Questions
a)At the end of a memory read operation, the MDR is loaded with a
binary combination, how that combination is interpreted as an instruction
or an operand to an instruction?
Bindu Agarwalla
1
CPU Organization
1.Single Accumulator Organization (One Address Instruction)
To understand the topic we will take an example and will discuss all the
organizations.
X=(A+B) * (C+D)
CPU Organization
1.Three Address Instruction:X=(A+B) * (C+D)
General format:
Opcode destination, src1, src2
Note: The order of operands vary from architecture to architecture.
General format:
Opcode src1/dest, src2
Note: The order of operands vary from architecture to architecture.
LOAD A //Acc←Mem[A]
LOAD C //Acc←[Mem[C]
ADD R1, P, Q
MUL R1, R1, R
DIV R1, R1, S
SUB R2, T, U
ADD X, R1, R2
CPU Organization
1.TWO Address Instruction: X:=(P+Q) * R / S + (T-U) / V
MOV R1, P
ADD R1, Q
MUL R1, R
DIV R1, S
MOV R2, T
SUB R2, U
DIV R2, V
ADD R1, R2
MOV X, R1
CPU Organization
1.One Address Instruction: X:=(P+Q) * R / S + (T-U) / V
LOAD P
ADD Q
MUL R
DIV S
STORE TEMP
LOAD T
SUB U
DIV V
ADD TEMP
STORE X
CPU Organization
1.ZERO Address Instruction: X:=(P+Q) * R / S + (T-U) / V
PUSH P
PUSH Q
ADD
PUSH R
MUL
PUSH S
DIV
PUSH T
PUSH U
SUB ADD
PUSH V POP X
DIV
CPU Organization
1RISC Instruction: X:=(P+Q) * R / S + (T-U) / V
LOAD R1, P
LOAD R2, Q
ADD R1, R1, R2
LOAD R2, R
LOAD R2, S
DIV R1, R1, R2
LOAD R2, T
LOAD R3, U
SUB R2, R2, R3 ADD R1, R1, R2
LOAD R3, V STORE X, R1
DIV R2, R2, R3
Organization Vs Architecture
Computer Architecture is the design of the system, visible to the
assembly language programmer
Implementation technology
All computers in the Intel Pentium series have the same architecture
but each version of Pentium has a different organization or
implementation
Von Neuman's Stored Program Concept
John Von Neumann has invented a m/c in Institute of Advanced Studies
in 1945 to 1952 which is named as stored program digital computer.
Harvard Architecture.
Basic Performance Equation
Program execution time (T) = (N X S)/R
Mode = 1 bits
Register = 6 bits
Total=25 bits
op code =32-25=7 bits
Problem
a)A computer has 64-bit instructions and 12 bit addresses. If
there are 352 three-address instructions, and 2256 no of
two-address instructions then how many one-address instructions
can be formulated?
1
Addressing Modes
Consider the following program segment. Here R1, R2 and R3 are the
general purpose registers. Instruction Operation MOV
R1, (3000) R1←M[3000] LOOP: MOV R2, (R3)
R2←M[R3] ADD R2, R1 R2←R1+R2 MOV (R3), R2
M[R3] ←R2 INC R3 R3←R3+1 DEC R1
R1←R1-1 BNZ LOOP Branch on not zero HALT
Stop Assume that the content of memory location 3000 is 10 and the
content of the register R3 is 2000. The content of each of the
memory locations from 2000 to 2010 is 100. The program is loaded
from the memory location 1000. All the numbers are in decimal.
Assume that the memory is word addressable. How many number of
memory references for accessing the data in executing the program
completely?
Addressing Modes
The addressing mode refers to the way in which the operand of an
instruction is specified.
Implied Mode: In this mode the operands are specified implicitly in the
definition of the instruction.
MOV #200, R1
ONE
Addressing Modes
Register Mode: In this mode, the operand is specified as the content of a
general purpose register.
Ex: MOV R1,R2
ONE
Addressing Modes
Direct Mode
In this mode, the operand is there in the memory, the address of the
operand is specified in the instruction only.
Ex: MOV NUM , R2
MOV 2000, R2
Note : The address can be specified either as a numeric value or as a symbolic
one.
R2
56
2000 56 NUM
No of memory references to execute the above
instruction is??
two
Memory
Addressing Modes
Register Indirect Mode
R1 R2
1000 56
1000 56
No of memory references to execute the above
instruction is??
two
Memory
Addressing Modes
Memory Indirect Mode:
In this mode, the operand is there in the memory, whose address is specified as the
content of another memory location in the instruction. i.e., the address of the address
of the operand is specified in the instruction.
R2
500 1000 num
56
1000 56
No of memory references to execute the above
instruction is??
three
Memory
Problem
Program to add N numbers stored in memory location starting from
NUM1 LOCATION
MOVE N, R1
MOVE #NUM1, R2
CLEAR R0
LOOP ADD (R2), R0
ADD #4, R2
DECREMENT R1
BRANCH >0 LOOP
MOVE R0, SUM
Note: In the effective address generation, index register content is not modified. It
is only used in the process.
R1 R2
1000 1000
78
N n
LIST student id
LIST+4 test 1 Student 1
LIST+8 test 2
LIST+12 test 3
student
Student 2
test 1
test 2
test 3
.
.
If the 1st student data is stored in memory from location 1000, then the next student
data will found at location 1016.
Problem
Program to add the average of score of three tests for a class having
N number of students.
MOVE #LIST, R0
CLEAR R1
CLEAR R2
CLEAR R3
MOVE N, R4
Another varaiants:
1. (Ri,Rj)
EA=[Ri] +[Rj]
2.
X(Ri,Rj)
EA=X+[Ri] +[Rj]
Addressing Modes
Relative Mode: In this mode, the operand is there in the memory, whose address
is the sum of the offset and content of index register. Offset is specified in the
address of the instruction. offset represents relative displacement. i.e., how far the
operand is located from the base.
Ex: Branch > 0 LOOP
R2
500 1000 num
56
1000 56
No of memory references to execute the above
instruction is??
three
Memory
Relative Mode
In this mode, the effective address is generated using offset and the
contents of PC Program Counter.
Ex: Branch > 0 LOOP
MOVE N, R1
MOVE #NUM1, R2
Here, when the branch instruction is
CLEAR R0 executed, that time, the value of PC
will be 1016(address of the next
1000: LOOP ADD (R2), R0
instruction), so from that value of PC,
1004 ADD #4, R2 we need to set PC at 1000, to jump to
the branch target instruction(add
1008 DECREMENT R1 instruction). So -16 will be added with
1016 to get the value 1000. -16 is
1012 BRANCH >0 LOOP represented as offset in the label of the
1016 MOVE R0, SUM branch instruction. here it is LOOP.
i.e., LOOP is repreented as -16.
Autoincrement mode
In this mode, the operand is there in the memory. The effective address of the operand
is the contents of a register specified in the instruction. After accessing the operand,
the contents of the register is automatically incremented to point to the next item in a
list.
ADD (R2)+, R0
TWO
Autodecrement mode
In this mode, the operand is there in the memory. The effective address of the operand
is the contents of a register specified in the instruction. Before accessing the operand,
the contents of the register is automatically decremented to point to the operand in a
list.
ADD -(R2), R0
TWO
Problem
Program to add N numbers stored in memory location starting from
NUM1 LOCATION
MOVE N, R1
MOVE #NUM1, R2
CLEAR R0
LOOP ADD (R2)+, R0
DECREMENT R1
BRANCH >0 LOOP
MOVE R0, SUM
Addressing Modes
EQU:
SUM EQU 200
It informs the assembler that wherever SUM is used, should be replaced by the value
200.
DATAWORD:
NUM DATAWORD 200
END:
END START
RESERVE
NUM RESERVE 100
ORIGIN RETURN
ORIGIN 200
Numericals
What is A two-word instruction is stored in a location A. The operand part of
instruction holds B. If the addressing mode is relative, the operand is available
in which location?
A relative mode branch type instruction is stored in memory at an address 750. The
branch is made to an address 500. What should be the value of the relative address field of
the instruction?
Assembly Language Instruction
Label Operation Operand(s) comment
An instruction is stored at location 200 with it’s address field having the value 10. A
processor register R10 contains the value 210 which is also used as index register.
Evaluate the effective address of the operand if the addressing mode of the instruction is
(i)direct;(ii)register direct;(iii)register indirect;(iv)indexed.
Register R1 and R2 of a computer contains the decimal value100 and 200. What are the
effective address of memory operand in each of the following instruction?
i)LOAD 20 (R2), R1
ii)MOVE 300, R5
iii)ADD (R1), R2
iv)MUL (R1)+, R5
Numericals
a)A machine has a 32-bit architecture, with 1-word long instructions. It has 60
registers, each of which is 32 bits long. It needs to support 45 instructions,
which have an immediate operand in addition to two register operands.
Assuming that the immediate operand is an unsigned integer, what is the
maximum value of the immediate operand?
Numericals
A two-word instruction is stored in memory at an address designated by the symbol P.
The address field of the instruction (stored at P+1) is designated by the symbol Q. The
operand used during the execution of the instruction is stored at an address symbolized
by EA. An index register contains the value X. State how EA is calculated from the
other addresses if the addressing mode of the instruction is direct, indirect, relative, and
indexed.
Numericals
Write the number of memory references required for executing the following
instructions:
i)ADD R1,(R2)+
ii)SUB #10,R2
iii)MOV R1, 20(R3,R4)
iv)AND R1,R2
v)Increment A
Numericals
An instruction is kept in memory at an address 300 and the memory address 301 occupies
the address field of the instruction which is shown below. The Opcode is used to add the
content of accumulator with an operand. The content of accumulator is 100 and the
content register R5 is 400. Find out the content of accumulator and Effective address of
operand if the addressing mode is
(i) immediate (ii) direct (iii) register direct (iv) indirect (v) register indirect
Address Instruction
300 Opcode Mode
301 500
400 700
401 456
500 600
600 800
Numericals
A general purpose register organization computer has a 16 bit instruction
consisting of opcode, source register and a destination register. It supports 7 no of
arithmetic operations and 6 no of logical operations. Find the total number of
maximum registers present in the system.
Consider a processor with 64 registers that supports twelve instructions. Each instruction
has five distinct fields, namely, opcode, two source register fields, one destination register
field, and a twelve-bit immediate value. Each instruction must be stored in memory in a
byte-aligned fashion. If a program has 100 instructions, What is the amount of memory
(in bytes) consumed by the program?
Numericals
A two word instruction LOAD is stored at location 1000 with its address field at location
1001. The address field has the value 2000 and the value stored at 2000 is 5000 and at
5000 is 6500. The words stored at 2200, 3002 are 3500, 4000 respectively. An index
register has value 200. Evaluate the effective address and operand if addressing mode of
the instruction is as follows:
I. Memory Indirect Addressing Mode
II. Relative Addressing Mode
III. Index Addressing Mode
Numericals
Write the equivalent instructions for Zero Address Organization and One
Address Organization of the following instructions:
MOV P, R1
SUB Q, R1
DIV R, R1
MUL S, R1
MOV R1, X
Numericals
Both of the following statements cause the value 150 to be stored in location
2000
ORIGIN 2000
DATAWORD 150
And
Move #150,2000
Explain the difference.
Numericals
Match each of the high level language statements given on the left hand side
with the most natural addressing mode from those listed on the right hand side.
Match columns:
A B
Indirect Relocatable code
Index Passing array as a parameter
Base Register Array
Auto increment while (*A++)
Home Work
A program is requiredfor the task C[i]=A[i] x B[i] Write a program for this task
on a computer that supports one address instructions. Assume thatC,A[i]and
B[i]are located in main memory and the values is stored in main memory
location N.
Instruction Set
An extensive set of instructions are provided to carry out various computational task.
According to the operation carried out by the computer , the instructions are classified into
3 categories:
a. Arithmetic Instruction
b. Logical Instruction
c. Shift Instruction
Name Mnemonic
Clear CLR
Complement COM
AND AND
OR OR
Exclusive-OR XOR
Clear carry CLRC
Set carry SETC
Complement carry COMC
Enable interrupt EI
Disable interrupt DI
Data Manipulation Instructions
AND:
is used to reset some specific bit position in a register , keeping all
other bits intact( unchanged).
R1 0 1 0 1 1 0 0 1
Let say, we want to change the bit position 4th to 0 , without distrubing all other bits. Then
we need to AND all other bits with 1 and 4th bit with 0. As, X AND 1=X
and X AND 0=0
0 1 0 1 1 0 0 1
AND 1 1 1 1 0 1 1 1
0 1 0 1 0 0 0 1
AND #F7H, R1
Data Manipulation Instructions
OR:
is used to SET some specific bit position in a register , keeping all
other bits intact( unchanged).
R1 0 1 0 1 1 0 0 1
Let say, we want to change the bit position 3rd to 1 , without distrubing all other bits. Then
we need to OR all other bits with 0 and 4th bit with 1. As, X OR 1=1
and X OR 0=X
0 1 0 1 1 0 0 1
OR 0 0 0 0 0 1 0 0
0 1 0 1 1 1 0 1
OR #04, R1
Data Manipulation Instructions
XOR:
is used to CLEAR the contents of a register .
R1 0 1 0 1 1 0 0 1
0 1 0 1 1 0 0 1
XOR 0 1 0 1 1 0 0 1
0 1 0 1 1 0 0 1
XOR R1, R1
Problems
SETC:
CF ← 1
CLRC:
CF ← 0
COMC:
CF ← CF
EI
IF ← 1
DI:
IF ← 0
Data Manipulation Instructions
• Shift
Name Mnemonic
Logical shift right SHR
Logical shift left SHL
Arithmetic shift right SHRA
Arithmetic shift left SHLA
Rotate right ROR
Rotate left ROL
Rotate right through carry RORC
Rotate left through carry ROLC
Logical Left Shift Instruction
SHL: Logical left shift for unsigned numbers.
Provide a means for shifting blocks of bits within a register or memory.
C 0
The contents of the OPERAND are shifted left by the number of bits specified in the source
operand of the instruction. The vacated bits are filled with zeros. The shifted bits are passed
through the C flag, and then dropped. Left Shifting an operand is equivalent to multiplying
the operand by 2 (bit postions shifted)
R1 0 0 0 1 0 0 0 0 0
SHL #2, R1
0 0 1 0 0 0 0 0 0
R1
Logical Right Shift Instruction
SHR: Logical right shift for unsigned numbers.
Provide a means for shifting blocks of bits within a register or
memory.
0 REGISTER C
The contents of the OPERAND are shifted right by the number of bits specified in the source
operand of the instruction. The vacated bits are filled with zeros. The shifted bits are passed
through the C flag, and then dropped. Right Shifting an operand is equivalent to dividing the
operand by 2 (bit postions shifted)
R1 0 0 0 1 0 0 0 0 0
SHR #2, R1
0 0 0 0 0 1 0 0 0
R1
Arithmetic Left Shift Instruction
SHLA: Arithmetic left shift for signed numbers.
Provide a means for shifting blocks of bits within a register or
memory.
C 0
The contents of the OPERAND are shifted leftt by the number of bits specified in the source
operand of the instruction. The vacated bits are filled with zeros. The shifted bits are passed
through the C flag, and then dropped. Leftt Shifting an operand is equivalent to multiplying
the operand by 2 (bit postions shifted)
R1 0 0 0 1 0 0 0 0 0
SHL #2, R1
0 0 1 0 0 0 0 1 0 R1
SHLA affects the Overflow flag. V= Rn-2 XOR Rn-1 . i.e., the Overflow flag will be 1 if the
sign changes after the shift operation, else it will be 0.
Arithmetic Right Shift Instruction
SHRA: Arithmetic right shift for signed numbers.
Provide a means for shifting blocks of bits within a register or
memory.
0 REGISTER C
The contents of the OPERAND are shifted right by the number of bits specified in the source
operand of the instruction. The vacated bits are filled by the previous sign bit. The shifted bits
are passed through the C flag, and then dropped. Right Shifting an operand is equivalent to
dividing the operand by 2 (bit postions shifted)
R1 1 0 0 1 0 0 1 0 0
SHRA #1, R1
1 1 0 0 1 0 0 1 0
R1
Here, R1 contains -110 before the shift operation, and after the SHRA,
it contains -55
Arithmetic Right Shift Instruction
Example 1
R1 1 1 0 0 1 0 0 1 0
SHRA #1, R1
R1 1 1 1 0 0 1 0 0 1
Here, R1 contains -55 before the shift operation, and after the SHRA, it
contains -28
Example 2
R1 1 1 1 1 1 0 0 1
SHR #1, R1
R1 1 1 1 1 1 1 0 0 1
Here, R1 contains -7 before the shift operation, and after the SHRA, it
contains -4
Arithmetic Right Shift Instruction
Example 3
R1 0 0 0 0 1 1 1 1 0
SHRA #1, R1
R1 0 0 0 0 0 1 1 1 1
Here, R1 contains +15 before the shift operation, and after the SHRA, it
contains +7
Representing Signed No in 2’s Complement Method
For a +ve Number, to represent in the 2’s complement method, sign bit should be made
0, i.e., the MSb should be 0 and for the magnitude part, just write the binary of the
number.
Example: + 14 in 8 bits:
MSb will be 0, then the binary of 14 in 7 bits, i.e., 0001110
So, +14= 00001110
For a -ve Number, to represent in the 2’s complement method, sign bit should be made
1, i.e., the MSb should be 1 and for the magnitude part, just take the 2’s complement of
the binary of the number. TThe 2’s complement of a binary can be taken by copying
the bits of the binary from the LSb till the 1st 1 is found, then all the remaining bits are
flipped.
Example: - 14 in 8 bits:
MSb will be 1,
then the binary of 14 in 7 bits, i.e., 0001110
Next take the 2’s complement of 0001110
So, start copying from LSb, 01 then flip all the remaining bits,
hence the result will be 1110010
So, -14= 11110010
Rotate Right Instruction (ROR)
The bits of the destination are rotated right. The number of bits rotated is determined
by the source operand. The bits rotated out of the least significant bit of the operand go
to both the carry bit and the most significant bit of the operand.
Rotate Left Instruction (ROL)
The bits of the destination are rotated left. The number of bits rotated is determined
by the source operand. The bits rotated out of the most significant bit of the operand go to
both the carry bit and the least significant bit of the operand.
Rotate Right through Carry Instruction (ROR)
The bits of the destination are rotated right. The number of bits rotated is determined
by the source operand. The bits rotated out of the least significant bit of the operand go
to the carry bit and the previous carry bit goes to the most significant bit of the operand.
Rotate Left through carryInstruction
(ROL)
The bits of the destination are rotated left. The number of bits rotated is determined
by the source operand. The bits rotated out of the most significant bit of the operand go to
both the carry bit and the previous carry bit goes to the least significant bit of the operand.
Program Control Instructions
Name Mnemonic
Branch BR
Jump JMP
Skip SKP
Call CALL
Return RET
Compare (Subtract) CMP
Test (AND) TST
Program Control Instructions
Call: is used to call a subroutine.
1000: Call P1
1004: Next Instruction
1. Stack[top]← [PC] // Return address, i.e., 1004 is stored onto the stack.
and then
2. PC← ADDRESS OF THE SUBROUTINE // Here it is represented by P1
• Example:
– A: 1 1 1 1 0 0 0 0 A: 11110000
– B: 0 0 0 1 0 1 0 0
+(−B): 1 1 1 0 1 1 0 0
11011100
C=1 Z=0
S=1
V=0
Basic Performance Equation
How to improve T?
Thank You
Stack
Bindu Agarwalla
1
STACK
0
.
2k -1
STACK [Assumption: The machine is byte addressable
and each element in the stack occupies 4Bytes]
Push Operation 0
• SP always points to the top element, so .
to push a newitem, first, it has to be
updated to the location, where a new
element can be pushed. SP=1976 19 Current top
• As stack grows downward in the 1980 -12 element
memory, so first SP is decremented,
then the NEWITEM is pushed. 1984 23
2k -1
19 NEWITEM
STACK [Assumption: The machine is byte addressable
and each element in the stack occupies 4Bytes]
Push Operation 0
• If stack stack starts in memory from .
2000, then the initial value of SP
should be 2004, then only the 1st
element will be pushed at correct SP=1976 19 Current top
location. element
1980 -12
1984 23
1988 88 Stack
1992
1996
2000 56 Bottom
element
.
2k -1
19 NEWITEM
STACK [Assumption: The machine is byte addressable
and each element in the stack occupies 4Bytes]
Pop Operation 0
• SP always points to the top element, so .
to pop an element, first, it has to be
popped off, then the SP need to be
updated to the next top element.
• As stack grows downward in the SP=1980 -12 Current top
memory, so SP is incremented, after element
the top item is popped off. 1984 23
1988 88 Stack
Move (SP), ITEM 1992
ADD #4,SP 1996
↓ 2000 56 Bottom
Move (SP)+, ITEM element
.
2k -1
-12 ITEM
Limitation of Link Register
and use of stack for subroutine
Linkage
Bindu Agarwalla
6
STACK[Assumption: the stack is from 2000 to 1500]
SafePush Operation
In a full stack, push operation should not be
SAFEPUSH Compare #1500, SP done. So, if the value of SP is 1500 or less than
1500, that means already the stack is full. Hence,
Branch <= 0 FULLERROR compare operation gives either 0 or a negative
value after the operation. If SP is 1500, then the
Move NEWITEM, -(SP) next push operation will be done at 1496, which
is not the part of stack.
SafePop Operation
Return
2000
PC 1004
Link 1004
Call Return
Limitation??
No support for Nesting of Functions.
Limitation of Link Register
and use of stack for subroutine
Linkage
Bindu Agarwalla
10
Limitation of Link Register
Nesting of functions/subroutines is not supported.
Solution
Using Stack: To support nesting of functions.
Assumptions:
Parameters are passed through general purpose registers.
Returning values through general purpose registers.
Example:
We are going to add N numbers stored in consecutive memory locations starting from
the symbolic address NUM1 using a function. the function is going to return the
summation result to the caller.
Limitation of Link Register
Nesting of functions/subroutines is not supported.
Stack as Subroutine Linkage Method
Calling program
1000 Move N, R1
1000 Move N, R1
Solution:
Parameters are passed onto the stack, before calling the function.
MOVE 16(SP), R1
MOVE 20(SP), R2 SP=1980 [R2]
CLEAR R0 1984 [R1]
MOVE 16(SP), R1
MOVE 20(SP), R2 1980 [R2]
CLEAR R0 1984 [R1]
MOVE 16(SP), R1
MOVE 20(SP), R2
CLEAR R0
LOOP Add (R2)+, R0
1992 1012
Decrement R1
1996 14 10
Branch > 0 LOOP
Move R0, 20(SP) 2000 RESULT OF
SUMMATION
MoveMultiple (SP)+, R0-R2
RETURN 2004 28
Stack as Subroutine Linkage Method[Parameter
Passing and returning value using stack
MOVE 16(SP), R1
MOVE 20(SP), R2
CLEAR R0
LOOP Add (R2)+, R0
Decrement R1
1996 10
14
Branch > 0 LOOP
2000 Result of summation
MoveMultiple (SP)+, R0-R2
RETURN 2004 28
Stack as Subroutine Linkage Method[Parameter
Passing and returning value using stack]
Calling program
29
Numerical 2
Suppose a program (or a program task) takes 1 billion instructions to execute on a
processor running at 2 GHz. Suppose also that 50% of the instructions execute in 3 clock
cycles, 30% execute in 4 clock cycles, and 20% execute in 5 clock cycles. What is the
execution time for the program or task?
Solution
Total Number instructions=N=109
S=CPI=3.7
R = 2GHz
So, T= (S x N)/R
= (3.7 x 109)/(2 x 109) sec= 1.85 sec
Basic Performance Equation
The basic performance equation, which is fundamental to measuring
computer performance, measures the CPU time, is as follows.
CPU Time To execute a program = T= time/program
Solution
Total Number instructions=N=10+5+5=20
Total cycles=7*10+11*5+6*5=70+55+30=155
S=CPI=155/20
R = 8GHz
So, T= (S x N)/R
= (155/20) x 20)/(8 x 109) sec= 19.375 x10-9 sec = 19.375 nsec.
Numerical 2
A program is running on an 8MHz processor. Assume that 50% instructions perform an
ALU operation, 30% instructions perform memory operation and 20% instructions
perform Branching. Further assume that instructions performing an ALU operation take 4
clock cycles, instructions performing memory operation take 9 clock cycles and
instructions performing Branching take 7clock cycles. What is the total time taken by the
program?
Solution
Total Number instructions=N=100
S=CPI=6.1
R = 8MHz
So, T= (S x N)/R
= (6.1 x 102)/(8 x 106) sec= 76.25 μ sec
Numerical 2
A program is running on an 8MHz processor. Assume that 50% instructions perform an
ALU operation, 30% instructions perform memory operation and 20% instructions
perform Branching. Further assume that instructions performing an ALU operation take 4
clock cycles, instructions performing memory operation take 9 clock cycles and
instructions performing Branching take 7clock cycles. What is the total time taken by the
program?
Solution
Total Number instructions=N=100
S=CPI=6.1
R = 8MHz
So, T= (S x N)/R
= (6.1 x 102)/(8 x 106) sec= 76.25 μ sec
Big Endian and
Little Endian Representation
35
Big Endian and Little Endian Concept
Little and big endian are two ways of storing multibyte data-types ( int, float,
etc). For Single Byte no ordering is required.
In byte ordering, the "big end" byte is called the "high-order byte" or the
"most significant byte".
The term ‘endian’ as derived from ‘end’ may lead to confusion. The end
denotes which end of the number comes first rather than which part comes at
the end of the sequence of bytes.
The basic endian layout can be seen in the table below:
40
Big Endian and Little Endian Concept
Example: Represent the integer 14342 in Big Endian and Little Endian
machines.
00 00 00 00 00 00 38 06
BIG ENDIAN
GATE Question On
Big Endian and Little Endian Representation
44
GATE 2021
If the numerical value of a 2-byte unsigned integer on a little endian computer
is 255 more than that on a big endian computer, which of the following
choices represent(s) the unsigned integer on a little endian computer?
a.0x6665
0x6566
b. 0x0001 0x00FF
c. 0x4243 0x6665
d. 0x0100
So, option a is correct
Solution: Let’s take the option a.
Little Endian= 0x6665
So, the original number= 0x6566
So, the big endian= 0x6566 , little endian= 0x6665
Given , Little endian=255 + Big endian
255= 1111 1111 =0xFF
GATE 2021
If the numerical value of a 2-byte unsigned integer on a little endian computer
is 255 more than that on a big endian computer, which of the following
choices represent(s) the unsigned integer on a little endian computer?
a.0x6665
0x0100
b. 0x0001 0x00FF
c. 0x4243 0x01FF
d. 0x0100
49
GATE 2018
A processor has 16 integer registers (R0, R1, … , R15) and 64 floating point
registers (F0, F1, … , F63). It uses a 2-byte instruction format. There are four
categories of instructions: Type-1, Type-2, Type-3, and Type 4. Type-1
category consists of four instructions, each with 3 integer register operands
(3Rs). Type-2 category consists of eight instructions, each with 2 floating
point register operands (2Fs). Type-3 category consists of fourteen
instructions, each with one integer register operand and one floating point
register operand (1R+1F). Type-4 category consists of N instructions, each
with a floating point register operand (1F).
(A) 32
(B) 64
(C) 256
(D) 512
GATE 2018
# of integer registers= 16, i.e., 4(24) bits are required to represent a integer
register.
# of fp registers= 64, i.e., 6(26) bits are required to represent a fp register.
4 4 4 4
int reg int reg
Type-1
4 6 6
fp reg fp reg
Type-2
6 4 6 Type-3
int reg fp reg
10 6 Type - 4
fp reg
GATE 2018
No of Type-1 instructions is : 4 x 212
Given, N x 26 = 211
N= 211/26 =25
N=32
GATE 2020
A processor has 64 registers and uses 16-bit instruction format. It has two types
of instructions: I-type and R-type. Each I-type instruction contains an opcode, a
register name, and a 4-bit immediate value. Each R-type instruction contains an
opcode and two register names. If there are 8 distinct I-type opcodes, then the
maximum number of distinct R-type opcodes is _____.
# of registers= 64, i.e., 6(26) bits are required to represent a register.
6 6 4 I-Type
Reg Imm
4 6 6 R - Type
Reg Reg
Let, N no of R-Type opcodes are possible
N x 212= 216- 8 x 26 x 24
N x 212= 216- 23 x 26 x 24
N = 24- 21 =14
Numerical 3
The content of register R1is 10110011. What will be the decimal value after execution
of following instruction. [Assume the number is represented in 2's complement format]
AShiftL #2, R1
0 1 0 1 1 0 0 1 1 R1
AShiftL#2, R1
1 0 1 1 0 0 1 1 0 R1
0 1 1 0 0 1 1 0 0 R1
Here, R1 contains -77 before the shift operation, and after the AshiftL 2 times,
it contains -11
Numerical 4
Execute the following instruction where Ro is of 8 bits and its content is
11001011.
i)Lshift L #2, Ro ii) Ashift R #1, Ro
0 1 1 0 0 1 0 1 1 R0
LshiftL#2, R0
1 1 0 0 1 0 1 1 0 R0
1 0 0 1 0 1 1 0 0 R0
Here, R0 contains -43 before the shift operation, and after the AshiftR 2 times,
it contains -11
Thank You
Basic Processing Unit
Bindu Agarwalla
1
Fundamental Concepts
Processor fetches one instruction at a time and perform the operation specified.
Processor keeps track of the address of the memory location containing the next
instruction to be fetched using Program Counter (PC).
Assuming that the memory is byte addressable, increment the contents of the
PC by 4 (fetch phase).
PC ← [PC] + 4
Carry out the actions specified by the instruction in the IR (execution phase).
Single Bus CPU Organization
Basic Operations involved in the execution
of an instruction
Transfer a word of data from one processor register to another or to the ALU.
Perform an arithmetic or a logic operation and store the result in a processor register.
Fetch the contents of a given memory location and load them into a processor register.
Store a word of data from a processor register into a given memory location.
Register Transfer Operation
MOV R1, R2
1. R2in
R1out,
Fetching a Word from Memory
MOV (R1),R2
2. WMFC
MDRinE,
3. MDRout, R2in
Storing a Word into Memory
MOV R2, (R1)
Address into MAR;
Data into MDR
1. R1out, MARin
2. MDRin ,Write
R2out,
3. WMFC
MDRoutE,
Performing an Arithmetic or
Logical Operation
ADD R1, R2, R3
Put one of the operands into Y register
Put the other operand on the bus and
perform the operation
Send the result into the destination
1. R1out, Yin
3. Zout, R3in
Execution of a Complete Instruction
Add (R3), R1
Bindu Agarwalla
15
Execution of a Complete Instruction
: Direct mode
MOV 300, R1
2. WMFC
2. WMFC
5. WMFC
6. MDRoutB, R5outA, SelectA, ADD, R5in, end
Hardwired Control Unit
To execute instructions, the processor must have some means of
generating
the control signals needed in the proper sequence.
Two categories:
Hardwired control Unit and
Microprogrammed control Unit
For CU to perform it’s function, it has some inputs that allows it to determine the state of
the system and outputs that allows it to control the behaviour of the system.
Internally, the CU must have some logic to perform its sequencingand execution
function.
Hardwired Control Unit
Inputs to CU:
1. Clock(Contents of control step counter)
2. Instruction Register (Contents of IR)
3. Flags(Contents of condition codes)
2. Next find out , in which instructions the signal is appearing and then find
out the step number of that instruction the signal is appearing.
3. Say, Zin is appearing in the 6th step of the instructionAdd (R3), R1. It means Zin
signal need to be generated for the step no 6 of ADD instruction, so when both the cases
are true, Zin need to be generated, like that for JMP L1 instruction , Zin is generated in
step no 4. i.e., in either of the two instructions Zin signal need to be generated.
4. So, the logic function, for Zin will be OR of the above two AND cases.
Zin= ADD.T6+ JMP.T4+........................................[+.........indicates other possible
cases]
5. Again, we hve seen that Zin is required for all the instructions in the step no 1 during
the fetch phase of any instruction, i.e., irrespective of any instruction, in the step no 1
Zin required. So,
Zin= T1+ ADD.T6+ JMP.T4+.......................................
Generating the Zin Signal
Add (R3), R1
1. PC out, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, end
JMP L1
1. PC out, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. Address_field_of_IRout, SelectY, Add, Zin
Step 2: We have seen that Zin is active for all the instructions in control sequence no 1.
i.e., it is not dependent on the instruction.
Next, Zin is active in the step no 6 for the ADD instruction,
Zin is active in the step no 4 for the JMP instruction,
Zin is active in the step no 4 for the BR instruction........
Step 3:
Zin= T1 + ADD.T6 + JMP.T4 + BRN.T4.N
Logic function for Zin signal
Logic function for End signal
Write the expressions to represent the circuit for generating control signals S5
and S10respectively? What will be the specification of step decoder and
instruction decoder in the hardwired control unit?
A computer has 58 instructions; each instruction requires at most 15
steps to complete its execution. What will be the specification of
instruction and step counter decoder used in hardware control unit
design?
A hardwired CPU has only 3 instructions I1, I2 and I3, which use the
following signals in time steps T1-T5
T1 T2 T3 T4 T5
I1 Ain,Bout PCout,Bin Zout,Ain Bin,Cout End
,Cin
I2 Cin,Bout, Aout,Bin Zout,Ain Bin,Cout End
Din
I3 Din,Aout Ain,Bout Zout,Ain Dout,Ain End
ML 0 1 2 3 4 5 6 7
Address
Content 10 23 25 20 12 3 1 2
3
25
12
20
Consider an example of memory organization as shown in the figure
below. Which valuewill be loaded into the accumulator when the
instruction “LOAD INDIRECT 7” is executed
ML 0 1 2 3 4 5 6 7
Address
Content 10 23 25 20 12 3 1 2
2
25
7
20
Consider a three word machine instruction-
ADD A[R0], @B
The number of memory cycles needed during the execution cycle of the
instruction is??
Microprogrammed Control Unit
An alternative to a hardwired control unit is a microprogrammed control unit, in which the
logic of the control unit is specified by a microprogram.
The term microprogram was first coined by M. V. Wilkes in the early 1950s
Introduction to Microprogrammed CU
Add (R3), R1
1. PC out, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin 7. Zout, R1in, end
μ P M M R S A Z Z P Y W I R R R R R S X M D R R R W . . . . E
i C A D e e d i o C i M R 3 1 1 2 2 u O U i i 4 4 r n
n o R R a l d n u i n F i o o i i o b R L v n i o i d
s u i o d e t n C n u u n n u n u t
t t n u c t t t t e
r t t
u
c
t
i
o
n
1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Terms Related to Microprogrammed Control Unit
Control Store: The instruction set of any computer is finite. The microroutines for
all the instructions in the instruction set is stored in a special memory called as
Control store/ Control Memory.
μPC: μPC points to the next microinstruction that need to be fetched from Control
store.
Organization of Control Memory
The fig. shows how the control words or
microinstructions could be arranged in a control
memory.
There is one bit for each internal processor control line and one bit for each system bus control line.
There is a condition field indicating the condition under which there should be a branch, and there
is a field with the address of the microinstruction to be executed next when a branch is taken.
Interpretation
To execute this microinstruction, turn on all the control lines indicated by a 1 bit; leave off all control
lines indicated by a 0 bit. The resulting control signals will cause one or more micro-operations to be
performed.
If the condition indicated by the condition bits is false, execute the next microinstruction in sequence.
If the condition indicated by the condition bits is true, the next microinstruction to be executed is
indicated in the address field.
Microprogrammed CU
To execute any instruction, the CU should first find the starting address of the
corresponding microroutine and then can generate the control signals in sequence
by reading the control words one by one.
Microprogrammed Control Unit
In this control unit, the μPC is incremented every time a new microinstruction
is fetched from the microprogram memory, except in the following situations:
1. When a new instruction is loaded into the IR, the μPC is loaded with the starting
address of the microroutine for that instruction.
3. When an End microinstruction is encountered the μPC is loaded with the address
of the first CW in the microroutine for the instruction fetch cycle.
Microprogrammed CU
Thank You
2’s Complement Numbers
And
Condition Codes
Bindu Agarwalla
Register Transfer Notation
Identify a location by a symbolic name standing for its hardware binary
address (LOC, R0,…)
Contents of a location are denoted by placing square brackets around the
name of the location.
R1←[LOC],
R3 ←[R1]+[R2
The 2’s complement of a binary can be taken by copying the bits of the binary from
the LSb till the 1st 1 is found, then all the remaining bits are flipped.
Example: - 14 in 8 bits:
The binary of 14 in 8 bits, i.e., 00001110
Next take the 2’s complement of 00001110
So, start copying from LSb into the resultant, till the first 1 (inclusive) then flip all
the remaining bits,
hence the result will be 11110010
So, -14= 11110010
In the other method: -14= 28-14=256-14=242=11110010 (128+64+32+16+2)
Representing Signed No in 2’s Complement Method
For a +ve Number, to represent in the 2’s complement method, just write the binary of
the number.
Example: + 14 in 8 bits:
The binary of 14 in 8 bits, i.e., 00001110
Basically, a flag register contains a collection of bits, where each bit position is
an indication of a particular condition that may occur due to an instruction is
executed.
Condition Codes
Four commonly used flags are:
C (Carry): Set to 1if carry-out results from the operation, else cleared to 0.
Note: Overflow occurs when the result of an arithmetic operation is outside the
range of values that can be represented by the number.
Overflow Condition
Overflow can occur only when adding two numbers of same sign.
The carry out signal from the sign bit position is not a sufficient indicator of
overflow when adding signed numbers.
When both operands X and Y have the same sign, an overflow occurs when the
sign of S is not matching with the signs of X and Y.
Overflow Condition [Example 1]
For example, add +7 and +4
So, the addition of +7 and +4 is generating
+7=0111 overflow condition, as the result is outside the
+4=0100 range of values that can be represented using 4
bits.
0111
+ 0100
1011
How to interpret the result?
As the numbers are represented in 2’s complement form, and the sign bit is 1,
hence the result is a negative quantity. To get the magnitude, take the 2’s
complement of the result (1011).
2’s complement of 1011 is 0101, (0101)2=(5)10
So, the result is -5
i.e., (+7) + (+4)=-5, an incorrect result
As, (+7) + (+4)=+11, and to represent +11 we need 5 bits, 1 for the sign, 4bits for the
magnitude
Overflow Condition [Example 2]
For example, add -4 and -6
So, the addition of -4 and -6 is generating overflow
-4=1100 condition, as the result is outside the range of
-6=1010 values that can be represented using 4 bits.
However, this time carry is generated out of MSb
1100 addition.
+ 1010
10110 from the result, carry out will be discarded.
How to interpret the result?
As the numbers are represented in 2’s complement form, and the sign bit is 0,
hence the result is a positive quantity. To get the magnitude, just write the decimal
equivalent of resultant bits.
(0110)2=(6)10
So, the result is +6
i.e., (-4) + (-6)=+6, an incorrect result
As, (-4) + (-6)=-10, and to represent -10 we need 5 bits, 1 for the sign, 4bits for the
magnitude.
Conditional Codes Numerical
Consider a register R1 contains a value 10101010 and R2 contains
11110000.What will be the value of carry , zero and overflow flags after the
execution of the instruction
ADD R1,R2// R2 is the destination
R1: 10101010
+(R2): 1 1 1 1 0 0 0 0
11 0 0 1 1 0 1 0
C=1 Z=0
N=1
V=0
Conditional Codes Numerical
Consider a register R1 contains a value 1 1 1 1 0 0 0 0 and R2 contains 0 0 0 1
0 1 0 0 .What will be the value of carry , zero, negative(sign) and overflow
flags after the execution of the instruction
SUB R2,R1// R1 is the destination
R1← [R1] - [R2]
R1← [R1] +( - [R2])
So, we will take the 2’s complement of R2’s content to take the negative of R2’s content.
We will take the 2’s complement of (00010100)
1
Some Basic Concepts
Processor Memory
k-bit
address bus
MAR Up to 2k addressable
n-bit locations
data bus
MDR Word length = n bits
Control Lines
(R/W, MFC, etc)
Memory Capacity
Some Basic concepts
Measures for the speed of a memory:
memory access time.
memory cycle time.
The time gap between the initiation of two consecutive memory operations
is called as the memory cycle time.
Some Basic concepts
◼ An important design issue is to provide a computer system with as large
and fast a memory as possible, within a given cost target.
◼ Several techniques to increase the effective size and speed of the memory:
▪ Cache memory (to increase the effective speed).
▪ Virtual memory (to increase the effective size).
Organization of bit cells in a 16 x 8 Memory chip
b’7 b1 b’1 b0 b’0
b7
w0 w0 w0
M
FF FF e
A0
w1 w1 w1 m
A1 o
Address r
A2 Decoder y
A3 C
e
l
w15 w15 w15
l
s
R/W
Sense/Write Sense/Write Sense/Write
Circuit Circuit Circuit
CS
All cells of a row are connected to a common line, known as the “word
line”.
10 bit
address
32-to-1 R/W
output multiplexer
and
input demultiplexer CS
Two inverters are cross connected to implement a basic storage element “latch”.
The cell is connected to one word line and two bits lines by transistors T1 and
T2.
When word line is at ground level, the transistors are turned off and the latch
retains its state.
Implementation of a SRAM Cell
Read operation:
1. In order to read state of SRAM cell, the word line is activated to close switches T1
and T2. Sense/Write circuits at the bottom monitor the state of b and b’
2. Sense/Write circuits at the bottom monitor the state of b and b’ and set the output
accordingly.
3. If the cell is in state 1, the signal on bit line b is high and the signal on bit line b’ is
low.
4. The opposite is true if the cell is in state 0.
Implementation of a SRAM Cell
Write operation:
1. The state of the cell is set by placing the appropriate value on bit line b and its
complement on b’, and then activating the word line.
3. The required signals on the bit lines are generated by the Sense/Write ckt.
Implementation of a DRAM Cell
Dynamic RAM (DRAM): slow, cheap, and dense memory.
Cell Implementation:
1-Transistor cell (pass transistor)
Trench capacitor (stores bit)
A sense amplifier connected to the bit line detects whether the charge stored on the
capacitor is above the threshold value.
If so, it drives the bit line to a full voltage that represents logic value 1. This voltage
recharges the capacitor to the full charge that corresponds to logic value 1.
If the sense amplifier detects that the charge on the capacitor is below the threshold
value , it pulls the the bit line to ground level, which ensures that the capacitor will
have no charge, representing logic value 0.
Hence, reading the contents of the cell automatically refreshes its contents.
SRAM Vs DRAM Cell
Static RAMs (SRAMs):
Consist of circuits that are capable of retaining their state as long as the power is
applied.
Volatile memories, because their contents are lost when power is interrupted.
Threshol
d voltag
e
0 Stored Refresh Cycle
Voltag Time
e for 0
2M X 8 Memory Design
Each row can store 512 bytes. 12
bits to select a row, and 9 bits to
select a group in a row. Total of 21
bits.
i.e., we need to connect 512K x 8 chips in a matrix form., where no of columns will
be= size of one location in the bigger size/ size of one location in the smaller size.
No of columns = 32/ 8= 4.
2M X 32 Using 512K X 8 Chips
Step 3: Find out the no of rows:
#rows x 4 =16
#rows =16/4=4
For 2M x 32 , memory, 21(2M=221) address lines are required, and for 512K x 8,
19(512K=219) address lines are required. So, out of 21 address lines, the 1st 19 lines
will be connected to all the 512K x 8 memory chips.
Then to select a row, out of 4 rows of 512K x 8 memory chips, the higher order 2
address lines (out of 21 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 4 chips of 512K x 8,
will give/ take 8 bits of data each, meeting the required size of 32 bits.
Numerical : Design 8M X 32 bits memory using
512KX8 bits memory chip.
Step 1: Find out , how many smaller size chips are required to meet the
required size:
i.e., we need to connect 512K x 8 chips in a matrix form., where no of columns will
be= size of one location in the bigger size/ size of one location in the smaller size.
No of columns = 32/ 8= 4.
Numerical : Design 8M X 32 bits memory using
512KX8 bits memory chip.
Step 3: Find out the no of rows:
#rows x 4 =64
#rows =64/4=16
For 8M x 32 , memory, 23(8M=223) address lines are required, and for 512K x 8,
19(512K=219) address lines are required. So, out of 23 address lines, the 1st 19 lines
will be connected to all the 512K x 8 memory chips.
Then to select a row, out of 16 rows of 512K x 8 memory chips, the higher order 4
address lines (out of 23 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 4 chips of 512K x 8,
will give/ take 8 bits of data each, meeting the required size of 32 bits.
Numericals
A computer employs RAM chips of 256 X 8 . The computer system needs 2K bytes of
RAM . Design the memory module of above configuration
Step 1: Find out , how many smaller size chips are required to meet the required size:
Divide the total required size/ Size of the smaller chip
2K X 8/ 256 x 8= 211 X 23 / 28 X 23 = 23= 8 Chips
Step2 : Find, how many smaller size chips need to connect in paralell to meet the
required data size.[ finding the no of columns in the matrix arrangement.
Here, in 2K x 8, 8 bits of data is required, and in 256 x 8 chip, 8 bits of data
can be communicated from one location.
So, 1, 256 x 8 chip needs to be connected in a column to meet the 8 bits data
size.
No of columns = 1
Numerical : Design 2K X 8 bits memory using 256X8
bits memory chip.
Step 3: Find out the no of rows:
#rows x 1 =8
#rows = 8
For 2K x 8 , memory, 11(2K=211) address lines are required, and for 256 x 8, 8
(256=28) address lines are required. So, out of 11 address lines, the 1st 8 lines will be
connected to all the 256 x 8 memory chips.
Then to select a row, out of 8 rows of 256 x 8 memory chips, the higher order 3
address lines (out of 11 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 1 chip of 256 x 8,
will give/ take 8 bits of data , meeting the required size of 8 bits.
Numericals
A computer uses RAM chips of 256 X 4 capacity. Design a memory capacity of 1KB by
using available chip.
Step 1: Find out , how many smaller size chips are required to meet the required size:
Divide the total required size/ Size of the smaller chip
1K X 8/ 256 x 4= 210 X 23 / 28 X 22 = 23= 8 Chips
Step2 : Find, how many smaller size chips need to connect in paralell to meet the
required data size.[ finding the no of columns in the matrix arrangement.]
Here, in 1K x 8, 8 bits of data is required, and in 256 x 4 chip, 4 bits of data
can be communicated from one location.
So, 2, 256 x 8 chip needs to be connected in parallel to meet the 8 bits data
size.
i.e., we need to connect 256 x 4 chips in a matrix form., where no of columns will
be= size of one location in the bigger size/ size of one location in the smaller size.
No of columns = 8/4 = 2
Numerical : Design 1K X 8 bits memory using 256X4
bits memory chip.
Step 3: Find out the no of rows:
#rows x 2 =8
#rows = 4
For 1K x 8 , memory, 10(1K=210) address lines are required, and for 256 x 4, 8
(256=28) address lines are required. So, out of 10 address lines, the 1st 8 lines will be
connected to all the 256 x 4 memory chips.
Then to select a row, out of 8 rows of 256 x 4 memory chips, the higher order 2
address lines (out of 10 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 2 chips of 256 x 4,
will give/ take 4 bits of data each , meeting the required size of 8 bits.
Typical Memory Hierarchy
Increasing
Increasing Processor cost per unit
Registers are at the top of the hierarchy. size
Fastest storage element, present inside the Registers
CPU. But, limited in number. Access
time < 0.5 ns Primary Cache
L1
Level 1 Cache : On chip cache, designed
using SRAM technology. Typical size is
in the range::(8 – 64 KB) and access
time: 1 ns Secondary
Cache L2
L2 Cache : Off chip cache. Typical size is
in the range::(512KB – 8B) and access
time: 3 to 10 ns Main Memory
Cache memory is an architectural arrangement which makes the main memory appear
faster to the processor than it really is.
These instructions may be the ones in a loop, nested loop or few procedures calling each
other repeatedly.
This is called “locality of reference”.
Goal is to achieve
Fast speed of cache memory access.
Balance the cost of the memory system.
The Basics of Caches
Mai
Processor Cac
Cache Main
mem
n
he Memory
ory
Processor issues a Read request, a block of words is transferred from the main
memory to the cache, one word at a time.
Subsequent references to the data in this block of words are found in the cache.
At any given time, only some blocks in the main memory are held in the cache.
Mapping functions determine where a main memory block will be placed in the
cache.
When the cache is full, and a block of words needs to be transferred from the main
memory, some block of words in the cache must be replaced. This is determined by
a “replacement algorithm”.
The Basics of Caches: Cache Hit
Existence of a cache is transparent to the processor. The processor issues Read and
Write requests in the same manner.
Read hit:
❖ The data is obtained from the cache.
Write hit:
Read Miss:
1. Block of words containing the requested word is transferred from the memory.
After the block is transferred, the desired word is forwarded to the processor. This
is called load-back.
1. Write Allocate:
Allocate new block in cache.
Write miss acts like a read miss, block is fetched and updated.
2. No Write Allocate:
Sends data to lower-level memory.
Cache is not modified.
When processor generates an address for any read/write operation, first we check
whether that address’s content is present in the cache memory or not.
If yes, then we perform our read/ write operation from the cache memory.
If not, then we bring the block containing the address that we are trying to access for
our read/write operation from the main memory. i.e., the unit of transfer between main
memory and cache is a block.
The main memory is bigger compared to the cache, hence the length of the main
memory address is more compared to the cache memory’s address.
In block placememt/ mapping functions we will see how to access a byte within a
block in the cache memory, for a given main memory address.
Mapping functions
Mapping functions determine how memory blocks are placed in the cache.
No of blocks in mm=2(b+w)/2w
Number of lines in cache= k=2r
j modulo 128
(a)Calculate the number of bits in each of the Tag, Block, and Word fields of the memory
address.
(b)When a program is executed, the processor reads data sequentially from the following word
addresses:
All the above addresses are shown in decimal values. Assume that the cache is initially empty.
For each of the above addresses, indicate whether the cache access will result in a hit or a miss.
Problem: Solution
Block size = 64 bytes = 26 bytes = 26 words (since 1 word = 1 byte)
Therefore, Number of bits in the Word field = 6
For a given 16-bit address, the 5 most significant bits, represent the Tag, the next 5 bits
represent the Block, and the 6 least significant bits represent the Word.
Problem: Solution
b) The cache is initially empty. Therefore, all the cache blocks are invalid.
Access # 1:
Address = (128)10 = (0000000010000000)2
(Note: Address is shown as a 16-bit number, because the computer uses 16-bit addresses)
Since the cache is empty before this access, this will be a cache miss
After this access, Tag field for cache block 00010 is set to 00000
Problem: Solution
Access # 2:
Address = (144)10 = (0000000010010000)2
Since tag field for cache block 00010 is 00000 before this access, this will be a cache hit
(because address tag = block tag)
Problem: Solution
Access # 3:
Address = (2176)10 = (0000100010000000)2
Since tag field for cache block 00010 is 00000 before this access, this will be a cache
miss (address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00001
Access # 4:
Address = (2180)10 = (0000100010000100)2
Since tag field for cache block 00010 is 00001 before this access, this will be a cache hit
(address tag = block tag)
Problem: Solution
Access # 5:
Address = (128)10 = (0000000010000000)2
Since tag field for cache block 00010 is 00001 before this access, this will be a cache miss
(address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00000
Access # 6:
Address = (2176)10 = (0000100010000000)2
Since tag field for cache block 00010 is 00001 before this access, this will be a cache
miss (address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00001
Example on Cache Placement & Misses
• Consider a small direct-mapped cache with 32 blocks
- Main memory of size 4GB
– Cache is initially empty, Block size = 16 bytes
– The following memory addresses (in decimal) are referenced:
2000, 2004, 2008, 3548, 3552, 3556.
– Map addresses to cache blocks and indicate whether hit or miss
23 5 4
• Solution: Tag Index offset
So, in the address 1A2BC012, 01 is the block no in the cache, to which this address will
be mapped to, i.e., we will check block no 01 in the cache for a hit or a miss.
Numericals on Mapping
Direct Mapping Question: Assume a computer has 32 bit addresses. Each block stores 16
words. A direct-mapped cache has 256 blocks. In which block (line) of the cache would we
look for each of the following addresses? Addresses are given in hexadecimal for
convenience.
a. 1A2BC012 b. FFFF00FF c. 12345678 d. C109D532
So, in the address FFFF00FF, 0F is the block no in the cache, to which this address will
be mapped to, i.e., we will check block no 0F in the cache for a hit or a miss.
So, in the address 12345678, 67 is the block no in the cache, to which this address will be
mapped to, i.e., we will check block no 67 in the cache for a hit or a miss.
So, in the address C109D532 , 53 is the block no in the cache, to which this address will
be mapped to, i.e., we will check block no 53 in the cache for a hit or a miss.
Numericals
Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size of main
memory is 128 KB. Find the number of bits in tag field and the size of the tag directory.
Solution:
Assumption: The memory is byte addressable.
The size of the main memory= 128 KB = 217B.
Hence, the no of bits in the main memory address is 17 bits.
Block size = 256 Bytes = 28 Bytes
Therefore, Number of bits in the Word field = 8
# of Blocks in the cache =Size of the cache memory/ Block size
= 16KB / 256B = 214/28 =26
i.e., 6 bits are required to represent a block no in the cache
# of tag bits = total address length - (block field length + word field length )
=17-(6+8)
=3 bits
Numericals
Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size of main
memory is 128 KB. Find the number of bits in tag field and the size of the tag directory.
Solution:
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 256 x 8 bits
= 256 Bytes
Numericals
Consider a direct mapped cache with block size 2 KB. The size of main memory is 64 GB
and there are 10 bits in the tag. Find the size of cache memory and the size of tag directory.
Solution:
block field length = total address length - ( # of tag bits+ word field length ) = 36-(10+11)
=15 bits
The size of cache memory = # of blocks x Size of each block = 2 x 2 Bytes =226 B
15 11
=64MB
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 215 x 10 bits
= 40960 Bytes
Numericals
Consider a machine with a byte addressable main memory of 232 bytes divided into blocks
of size 32 bytes. Assume that a direct mapped cache having 1K cache lines is used with
this machine. What is the size of the tag field ?
Solution:
Assumption: The memory is byte addressable.
Block size = 32 bytes = 25 Bytes
Therefore, Number of bits in the Word field = 5
# of Blocks in the cache =1K =210
i.e., 10 bits are required to represent a block no in the cache
The size of the main memory= 232B.
Hence, the no of bits in the main memory address is 32 bits.
# of tag bits = total address length - (block field length + word field length )
=32-(10+5)
=17 bits
Numericals
An 16 KB direct-mapped write back cache is organized as multiple blocks, each of size 16
bytes. The processor generates 32 bit addresses. The cache controller maintains the tag
information for each cache block comprising of the following-
1 valid bit, 1 modified bit 2 replacement bits and as many bits as the minimum needed
to identify the memory block mapped in the cache.
What is the total size of memory needed at the cache controller to store meta data (tags) for
the cache?
Solution:
Assumption: The memory is byte addressable.
Block size = 16 bytes = 24 Bytes
Therefore, Number of bits in the Word field = 4
# of Blocks in the cache =Size of the cache memory/ Block size
= 16KB / 16B = 214/24 =210
i.e., 10 bits are required to represent a block no in the cache
# of tag bits = total address length - (block field length + word field length )
=32-(10+4)
=18 bits
Numericals
An 16 KB direct-mapped write back cache is organized as multiple blocks, each of size 16
bytes. The processor generates 32 bit addresses. The cache controller maintains the tag
information for each cache block comprising of the following-
1 valid bit, 1 modified bit 2 replacement bits and as many bits as the minimum needed
to identify the memory block mapped in the cache.
What is the total size of memory needed at the cache controller to store meta data (tags) for
the cache?
Solution:
Size of the memory required to store the meta data= # of blocks in the cache x ( tag bits +
1valid bit + 1 modified bit + 2 replacement bits)
A main memory block can occupy any of the free blocks from a specific set of
blocks.
Cache memory is organized as a set of blocks. there may be 2 blocks in set/ 4
blocks per set/ m number of blocks per set. The number of blocks present in a
set is called as a way number.
Here a main memory block i will be mapped to a specific set j, where
j= i mod (# no of sets in the cache)
The main memory block i will be mapped to any of the blocks present in the
set j.
To find a main memory block i, in the cache, the tag of block i will be compared
ony with the tags of the blocks present in the set j.
Set-Associative Cache
Replacement of a block is required only when all the blocks in a set is occupied
and an incoming block is mapped into that set.
For associative mapped cache, way number is the number of blocks in the
cache.
Here, main memory address will be divided into 3 parts: tag, set, word.
To find the bits in the set field, we need to find the no of sets in the cache: no
of blocks in the cache/ way no. Then expressing that result in the powers of 2
and taking the exponent as the length of the set field.
To find the bits in the tag field, we need to find the number of blocks in the main
memory/the no of sets in the cache. Then expressing that result in the powers of 2
and taking the exponent as the length of the tag field.
Set Associative Mapped Cache
Problem
A computer system uses 16-bit memory addresses. It has a 2K-byte cache organized in a 2-wat
set associative manner with 64 bytes per cache block. Assume that the size of each memory
word is 1 byte.
(a)Calculate the number of bits in each of the Tag, set, and Word fields of the memory address.
(b)When a program is executed, the processor reads data sequentially from the following word
addresses:
All the above addresses are shown in decimal values. Assume that the cache is initially empty.
For each of the above addresses, indicate whether the cache access will result in a hit or a miss.
Problem: Solution
Block size = 64 bytes = 26 bytes = 26 words (since 1 word = 1 byte)
Therefore, Number of bits in the Word field = 6
For a given 16-bit address, the 6 most significant bits, represent the Tag, the next 4 bits
represent the Set, and the 6 least significant bits represent the Word.
Problem: Solution
b) The cache is initially empty. Therefore, all the cache blocks are invalid.
Access # 1:
Address = (128)10 = (0000000010000000)2
(Note: Address is shown as a 16-bit number, because the computer uses 16-bit addresses)
Since the cache is empty before this access, this will be a cache miss
After this access, Tag field for the first block in the cache set 0010 is set to 000000
Problem: Solution
Access # 2:
Address = (144)10 = (0000000010010000)2
Since tag field for the first cache block in the set 0010 is 00000 before this access, this
will be a cache hit (because address tag = block tag)
Problem: Solution
Access # 3:
Address = (2176)10 = (0000100010000000)2
The tag field for this address does not match the tag field for the first block in set 0010.
The second block in set 0010 is empty. Therefore, this access will be a cache miss.
After this access, Tag field for the second block in set 0010 is set to 000010
Access # 4:
Address = (2180)10 = (0000100010000100)2
Since tag field for the 2nd cache block in the set 0010 is 00001 before this access, this
will be a cache hit (address tag = block tag)
Problem: Solution
Access # 5:
Address = (128)10 = (0000000010000000)2
The tag field for this address matches the tag field for the first block in set 0010. Therefore,
this access will be a cache hit.
Access # 6:
Address = (2176)10 = (0000100010000000)2
The tag field for this address matches the tag field for the second block in set 0010.
Therefore, this access will be a cache hit.
Numericals
Consider a fully associative mapped cache of size 16 KB with block size 256 bytes. The
size of main memory is 128 KB. Find the number of bits in tag field and the size of the tag
directory.
Solution:
Assumption: The memory is byte addressable.
The size of the main memory= 128 KB = 217B.
Hence, the no of bits in the main memory address is 17 bits.
Block size = 256 Bytes = 28 Bytes
Therefore, Number of bits in the Word field = 8
# of tag bits = total address length - word field length
=17- 8
=9 bits
Size of the tag directory = no of blocks in the cache x no of bits in the tag field
# of Blocks in the cache =Size of the cache memory/ Block size
= 16KB / 256B = 214/28 =26
Size of the tag directory = 64 x 9 bits = 72Bytes
Numericals
Consider a fully associative cache of size 256 KB with block size 1 KB. There are 18 bits
in the tag. Find the size of main memory and the size of the tag directory.
Solution:
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 256 x 8 bits
= 256 Bytes
Numericals
Consider a fully associative mapped cache with block size 4 KB. The size of main
memory is 16 GB. Find the number of bits in tag.
Solution:
# of Sets in the cache =No of blocks in the cache memory/ Way no(# of blocks in a set)
= 25/4= 23
total address length = # of tag bits + # of set bits + word field length
# of tag bits = 18 -( 3+9) = 6 bits
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 32 x 6 bits
= 24 Bytes
Numericals
Consider a 4-way set associative mapped cache of size 512 KB with block size 1 KB.
There are 9 bits in the tag. Find the size of main memory and the size of the tag directory
Solution:
Assumption: The memory is byte addressable.
Block size = 1KB = 210 Bytes
Therefore, Number of bits in the Word field = 10
# of Blocks in the cache =Size of the cache memory/ Block size
= 512KB / 1KB = 219/210 =29
# of Sets in the cache =No of blocks in the cache memory/ Way no(# of blocks in a set)
= 29/4= 27
total address length = # of tag bits + # of set bits + word field length
= 9+ 7+ 10 = 26 bits
The size of main memory = 226 bytes = 64MB
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 512 x 9 bits
= 576 Bytes
Numericals
Consider a 4-way set associative mapped cache with block size 4 KB. The size of main
memory is 16 GB and there are 10 bits in the tag. Find the size of cache memory and the
size of the tag directory.
Solution:
Assumption: The memory is byte addressable.
Block size = 4 KB = 212 Bytes
Therefore, Number of bits in the Word field = 12
The size of main memory = 16GB = 234 bytes
total address length = # of tag bits + # of set bits + word field length
# of set bits = 34 - (10 + 12) = 12 bits
Size of the cache memory = # of sets in the cache x way no x size of a block
= 212 x 4 x 212 Bytes = 226 Bytes = 64MB
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 212 x 4 x 10 bits
= 20,480 Bytes
Numericals on Mapping
A cache consists of a total of 128 blocks. The main memory contains 2K
blocks, each consisting of 32 words.
( I )How many bits are there in each of the TAG, BLOCK and WORD
field in case of direct mapping?
( ii )How many bits are there in each of the TAG, SET, and WORD
field in case of 4-way set-associative mapping?
Solution:
Assumption: The memory is word addressable.
Block size = 32 words = 25 words
Therefore, Number of bits in the Word field = 5
# of Blocks in the cache =128 =27
i.e., 7 bits are required to represent a block no in the cache
# of Blocks in the main memory =2K =211
To find the tag bits, # of Blocks in the main memory/ # of Blocks in the cache
= 211/ 27 = 24
The no of tag bits = 4
Numericals on Mapping
A cache consists of a total of 128 blocks. The main memory contains 2K
blocks, each consisting of 32 words.
( I )How many bits are there in each of the TAG, BLOCK and WORD
field in case of direct mapping?
( ii )How many bits are there in each of the TAG, SET, and WORD
field in case of 4-way set-associative mapping?
Solution:
Length of the main memory address = No of tag bits + bits in the set field + no of bits in
word field
= 11+ x+y
Now replacing the value of x+y from equation <1 > , we get
Length of the main memory address = 11+ 15 = 26 bits
Size of the main memory = 226 Bytes = 64MB
Numericals
Consider a 4-way set associative mapped cache. The size of main memory is 64 MB and
there are 11 bits in the tag. Find the size of cache memory.
Solution:
total address length = # of tag bits + # of set bits + word field length
26 = 10 + x+ y
=> x+y = 15................................<1>
Now replacing the value of x+y from equation <1 > , we get
Size of the cache memory =2(2+15)Bytes
= 217 Bytes = 128 KB
Numericals
A computer has a 256 KB, 4-way set associative, write back data cache with block size of
32 bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag
directory entry contains in addition to address tag, 2 valid bits, 1 modified bit and 1
replacement bit. What is the width if tag field and the size of the tag directory?
Solution:
Assumption: The memory is byte addressable.
Block size = 32B = 25 Bytes
Therefore, Number of bits in the Word field = 5
# of Sets in the cache =No of blocks in the cache memory/ Way no(# of blocks in a set)
= (Size of the cache memory/ Block size ) / Way no
= (256 KB / 32 B ) / 4
= (218/25 ) /22 = 213/22 =211
total address length = # of tag bits + # of set bits + word field length
# of TAG bits = 32 - (11 + 5) = 16 bits
Width of tag field = address tag bits + 2 valid bits + 1 modified bit + 1 replacement bit
= 16 + 2 + 1 +1 = 20bits
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 213 x 20 bits
= 20,480 Bytes
Numericals
Consider a direct mapped cache with 8 cache blocks (0-7). If the memory block requests
are in the order-
3, 5, 2, 8, 0, 6, 3, 9, 16, 20, 17, 25, 18, 30, 24, 2, 63, 5, 82, 17, 24
Which of the following memory blocks will not be in the cache at the end of the sequence?
3 18 20 30 Also, calculate the hit ratio and miss ratio.
Numericals
Consider a fully associative cache with 8 cache blocks (0-7). The memory block requests
are in the order-
4, 3, 25, 8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7
If LRU replacement policy is used, which cache block will have memory block 7?
Also, calculate the hit ratio and miss ratio.
Numericals
Consider a 4-way set associative mapping with 16 cache blocks. The memory block
requests are in the order-
0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155
If LRU replacement policy is used, which cache block will not be present in the cache?
3
8
129
216
If the block contains valid data, then the bit is set to 1, else it is 0.
Valid bits are set to 0, when the power is just turned on.
When a block is loaded into the cache for the first time, the valid bit is set to 1.
Data transfers between main memory and disk occur directly bypassing the cache.
[DMA Technique]
Case 1: from Disk to Main memory
When a new block is transferred from disk to the main memory block ‘i’ using
DMA technique, whose one copy also exists in the cache, then due to this DMA
transfer, the cache copy will become ‘stale’ . As, now the main memory block ‘i’
contains some different data.
So , cache memory block ‘j’ corresponding to main memory block ‘i’ need to be
invalidated, i.e., valid bit is cleared to 0 for the cache block j..
Cache Coherence Problem
Case 2: What happens if the data is transferred from the main memory to the disk
and the write-back protocol is being used?
In this case, the data in the cache may have been changed and is indicated by the dirty
bit.
The copies of the data in the cache, and the main memory are different. This is called the
cache coherence problem.
One option is to force a write-back before the main memory is updated from the disk.
Interleaving
Divides the memory system into a number of memory modules. Each module has its
own address buffer register (ABR) and data buffer register (DBR).
Arranges addressing so that successive words in the address space are placed in
different modules.
When requests for memory access involve consecutive addresses, the access will be to
different modules.
Since parallel access to these modules is possible, the average rate of fetching words
from the Main Memory can be increased.
000
001
010
011
100
101
110
111
Methods of address layouts
10
Introduction to Virtual Memory
An important challenge in the design of a computer system is to provide a large, fast
memory system at an affordable cost.
Cache memories were developed to increase the effective speed of the memory system.
Virtual memory is an architectural solution to increase the effective size of the memory
system.
Large programs that cannot fit completely into the main memory have their parts stored
on secondary storage devices such as magnetic disks.
▪ Pieces of programs must be transferred to the main memory from secondary storage
before they can be executed.
When a new piece of a program is to be transferred to the main memory, and the
main memory is full, then some other piece in the main memory must be replaced.
▪ Recall this is very similar to what we studied in case of cache memories.
Introduction to Virtual Memory
Operating system automatically transfers data between the main memory and secondary
storage.
▪ Application programmer need not be concerned with this transfer.
▪ Also, application programmer does not need to be aware of the limitations imposed
by the available physical memory.
Techniques that automatically move program and data between main memory and
secondary storage when they are required for execution are called virtual-memory
techniques.
Programs and processors reference an instruction or data independent of the size of the
main memory.
10
Paging
The concept of virtual memory is implemented using the concept of paging.
The logical memory is divided into same size blocks called as Pages.
When a process is executed, its pages are loaded into the available memory
frames. The pages belonging to the process can be stored in any of the free frames,
they need not have to be contiguous.
A particular page number (p) can be allocated any frames (f) in the main
memory.
Paging ..
• As process consists of many no of pages, each of them will be scattered
thoughout the available frames in the main memory.
The index into the Page table indicates the page number and entry of the
index indicates the frame no allocated to the page.
Address Translation: VA to PA
Paging ..
To generate a PA for a given LA , the page number (p) is indexed into the page table
(PT).
The page table base register (PTBR) contains the base address of the PT.
PTBR's content is combined with the page number (p) to get the corresponding entry
into the PT.
Let the entry is 'f'. So 'f' is the frame number for the given page no (p).
The frame no 'f' is combined with the offset 'd' to get the actual page address (PA).
PA =2 X 4 + 2 = 10
In Binary:
LA = 6 = 110
P = 1, d=01
PA = PT[1] 10
= 1010 [As PT[1]=2=10]
Virtual Memory: TLB
Bindu Agarwalla
11
Paging with TLB
With Paging to access a byte from memory, we need to refer to the memory twice.
Firstly for the page table to get the frame number.
Secondly to get the byte from the generated physical address.
To overcome this slow down, a special small first lookup hardware cache called a
translation-look-aside-buffer (TLB).
118
Paging with TLB
When an associative memory is presented with an item, the item is compared with all the
keys simultaneously.
Then the frame no is obtained, it is used to generate the PA as well as page no and
frame no information is entered into the TLB for quick future reference.
121
Use of TLB in Paging..
If TLB is maintained process wise, then for every context switch, it has to be
flushed.
To avoid this , some TLBs store address space identifier (ASIDs) in each
TLB entry.
Answer : LA = p + d
To get 'p' express no of pages in powers of 2 and the exponent will
be 'p'
256 = 28
so, p = 8 bits
To get d express the page size in the powers of 2 and the exponent
will be d
4KB = 212 B
so , d = 12 bits
Answer : PA = f + d
To get 'f' express no of frames in powers of 2 and the exponent will be 'f'
64 = 26
so, f = 6
d will be same for LA and PA
d= 12
Therefore PA = p + d = (6 + 12) bits
= 18 bits
Practice Questions
Assume a program consists of 8 pages and a computer has 16 frames of
memory. A page consists of 4096 words and memory is word addressable.
Currently, page 0 is in frame 2, page 4 is in frame 15, page 6 is in frame 5
and page 7 is in frame 9. No other pages are in memory. Translate the
memory addresses below.
a.111000011110000 b.000000000000000
12
127
Effective Access Time using TLB
• Hit Ratio : The percentage % of times that the requested page number is
found in the TLB.
• EMAT = h(TLB + MA) + (1-h)(TLB + 2 x MA)
• When we find a page in the TLB, then access time will be the sum of the
time used to access the TLB and the time taken to access the data from the
memory. This is represented in the first part of the equation.
• When we fail to find or miss the page in TLB, then the total time will be
sum of the time to used to access the TLB (miss) + time used to access the
PT time used to access the data item.
128
Problem
Memory access time = 50 ns
Hit Ratio = 75%
TLB access time= 2ns
Therefore ,
Therefore ,
EMAT = 2 x MA
= 2 x 50 ns
= 100 ns
130
Problem
The size of virtual memory is 256G Bytes and the physical memory is 4G
Bytes. The page size is 8M Bytes. What would be the size of page table
assuming 6 bits are used as control bits in the page table.
Solution:
Virtual Memory = 256G Bytes = 28 x 230 Bytes = 238 bytes
Logical address = 38 bits
Page size = 8M Bytes = 223 Bytes
Page no = (38-23) = 15 bits Offset = 23bits
Physical memory = 4G Bytes = 232 Bytes
Frame no = (32-23) = 9 bits
In a page table , page number of entries will be there and each entry consists of
frame no (9 bits) + control bits (6 bits, given )
h1=0.75 Tavg1=52ns
t2=20t1
Tavg2=hN × t1+(1- hN ) × t2
hN =
135
2-Level Cache
136
PROBLEM
Consider a three level memory system with access times per word 20ns,
40ns, 100ns. Hit ratios are 0.7, 0.8 and 1 respectively. If the referred word
is not available in level1 get the two word block from level2 to level1 and
supply the desired word to the processor. If it is not available in level2
then get a 4 word block from level3 to level2 and transfer the associated
block from level2 to level1. Handover the desired word to processor from
level1. what is the average access time?
tave= h1t1 +(1-h1)[h2(tB+t1)+(1-h2)(tB’+tB+t1)]
h1=.7, h2=.8
tave= 0.7x20 +
137
Thank You
Multiplication of Signed Integers
Booth’s Algorithm
Bindu Agarwalla
1
Booth’s Algorithm
It treats both +ve and -ve multiplier in the uniform way.
10 0 1 1 1 0 0 1 0 1 0 1 0 0
-1 0 +1 0 0 -1 0 +1 -1 +1 -1 +1 -1 0 0
Booth’s Algorithm
1 0 01 1
+1 -1 +1 0 -1
000000 1 1 0 1
000000 0 0 0
1111 00 1 1
0001 10 1
1100 11
11101 11 0 0 0 1
Ans is = -(2’comp of (1 1 0 1 1 1 0 0 0 1)
= - 10001111 = -143
-7 x -11 using Booth’s Algorithm
+7= 00111 +11= 01011
-7= 11001
-11= 10101=-1+1-1+1-1
1 1 0 01
-1 +1 -1+1-1
000000 0 1 1 1
1111 1 1 0 0 1
0000 01 1 1
1111 00 1
0001 11
0001 00 1 1 0 1
Ans is = 77
Multiplication of Signed Integers
Booth’s Algorithm[Method 2]
Bindu Agarwalla
7
Flowchart for Booth’s Algorithm
Booth’s Algorithm (-7 x 3) [+7=0111 -7=1001]
Register Multiplier Multiplicand Operation Remark
Register Register
A Q Q-1 M
0000 0011 0 1001 Initially
10 0 1 1 1 0 0 1 0 1 0 1 0 0
-1 0 +1 0 0 -1 0 +1 -1 +1 -1 +1 -1 0 0
The general case.
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
-1 +1 -1+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1-1
The Worst case.
1 1 1 1 1 0 0 0 0 0 1 1 1 1 1
0 0 0 0 -1 0 0 0 0 +1 0 0 0 0 -1
If the sign of A is 1, set q0 to 0 and add M back to A (restore A); otherwise, set
q0 to 1
Repeat these steps n times.
Restoring Division Example (15 / 5)
A Q
Initially 0 0000 1111
M 0 0101 -M= 1 1 0 1 1
ShiftL 0 0001 111
10 0001
ShiftL 0 0011 11 0
Subtract 1 1011
1 1110 11 0 0
Restore 0 0101
10 0011
ShiftL 0 0111 1 0 0
Restoring Division Example (15 / 5)
A Q
0 0111 1 0 0
Subtract 1 1011 1 0 0 1
10 0010
ShiftL 0 0101 0 0 1
1 1011
10 0000 0 0 1 1
Non restoring Division
In case of restoring division, after unsuccessful division, M is added to A then
shifted to left and M is subtracted from it.
A+M
2(A+M)
2(A+M) - M
2A+2M - M = 2A +M
25
Why Floating Point Numbers?
The maximum value that can be represented using 32 bits is 4,294,967,295, is an
integer equal to 232 − 1.
But for representing such numbers 32 bits are not sufficient enough, we need larger
number of bits in our conventional methods.
Solution:
IEEE 754 Standard: Single Precision(32 bits)
Double Precision(64 bits)
Introduction to Basic Terms
For a number 6.0247 x 1023
It consists of 3 components:
Biased exponent says that we donot want our exponent as a signed number, we want it
only as a positive number.
So to get only +ve values, we are using biased exponent, where we will add bias to the
actual exponent and the result will be stored as the exponent of the number.
Then to perform implicit normalization we need to bring the decimal point to the
left of 1st one in the number.
1001.01= 1.00101 x 23
Steps to convert a given decimal number into IEEE Format
Step1: Convert the number into Binary.
Step 4: Add 127 to the exponent , then write the binary of it for E’
13.25 = 1101.01
1101.01 = 1.10101 x 23
E’=127+3=130= 10000010
Step 5: Mantissa is whatever is to the right of the decimal point in the normalized
number. If it is not of 23 bits, then append zeros to the right of the actual mantissa to
make it of length 23 bits. Solution: S=1
E’=1000 0010
M= 1010 1000 0000 0000 0000 000
M= 1010 1000 0000 0000 0000 000
Example : Convert -13.25 into IEEE Double Precision Format
Step1: Convert the number into Binary.
13.25 = 1101.01
1101.01 = 1.10101 x 23
17 = 10001
10001.0 = 1.0001 x 24
E’=127+4=131= 10000011
Step 5: Mantissa is whatever is to the right of the decimal point in the normalized
number. If it is not of 23 bits, then append zeros to the right of the actual mantissa to
make it of length 23 bits. Solution: S=1
E’=1000 0011
M= 0001 0000 0000 0000 0000 000
M= 0001 0000 0000 0000 0000 000
Example : Convert -0.35 into IEEE Format
Step1: Convert the number into Binary.
0.35 = 0.01011
S=0
E’= 10000011
M=11000....................00
S=0
E’= 10000000
M=11000....................00
S=0
E’= 11111110
M=11111..................11
S=1
E’= 11111110
M=11111..................11
S=0
E’= 00000001
M=0000.....0000
S=1
E’= 00000001
M=0000.....0000
1.1 x 2-130
M=1
E’=-130+127=-3
IEEE- 754 standard says that, for a small number normalize it till -126 bits.
0.0011 x 2-126
M=0011
E’=00000000
This is an example of denormalized number
Number 1: Number 2:
S=0 S=0
E’= 1000 0100 E’= 1000 0011
M=000111100.....0 M=0100100...00000
Number 1: Number 2:
1.0001111 x 25 1.01001 x 24
Choose the number with the smaller exponent. and shift its mantissa right until the
exponents of both the numbers are equal. So number 2 will become
0.1010010 x 25 and now perform the addition
1.0 0 0 1 1 1 1 x 25
0.1 0 1 0 0 1 0 x 25
1.1 1 0 0 0 0 1 x 25
Value Represented
Value represented =(-1)sX0.M X 2-126
What is the value represented by the following 32 bits in IEEE -754
REPRESENTATION?
S=0
E’= 00000000
M=11000....................00≠0=>Denormalized number
Consider three registers R1, R2, and R3 that store numbers in IEEE−754 single
precision floating point format. Assume that R1 and R2 contain the values (in
hexadecimal notation) 0x42200000 and 0xC1200000, respectively.
R1, S=0
E’=10000100 = 132-127=5
M= 1.010000000......0
R2= 0xC1200000
R2= 1100 0001 0010 0000 .......................0000
R2, S=1
E’=1000 0010 = 130-127=3
M= 1.010000000......0
R2, 1.0100 x 23 = 1010=-10
So, R1/R2= 40/-10= -4
-4= 100.0 X 20 = 1.00 x 22, S=1, E’=1000 0001, M=0000000.......0
1100 0000 100.......... .....0= C0800000
THANK YOU
Arithmetic
Chapter 4
Addition/subtraction of signed
numbers
xi yi Carry-i ci Su si Carry-o ci +
n m ut 1
At the ith stage:
0 0 0 0 0 Input:
0 0 1 1 0 ci is the carry-in
0 1 0 1 0
0 1 1 0 1
Output:
1 0 0 1 0 si is the sum
1 0 1 0 1 ci+1 carry-out to
1 1 0 0 1
1 1 1 1 1 (i+1)st
state
si xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i Å yi Å ci
ci == yi ci + xi ci + xi yi
+1
Exampl
e:
X 7 0 1 1 1 Carry-o xi Carry-i
+Y = + = + 00 1 1 1 1 0 0 0 yi
ut ci+ n ci
Z 61 1 1 0 1 1 si
3
Legend for i
stage
Addition logic for a single stage
Sum Carry
yi
c
i
xi
xi
yi si c
c i +1
i
ci
x
xi yi i
yi
ci + 1 Full ci
( A)
adder
F
s
i
xn - 1 yn - 1 x1 y1 x0 y0
cn - 1 c1
cn FA FA FA c0
sn - 1 s1 s0
Most significant Least significant
bit(MSB) bit(LSB)
position position
cn
n-bi n-bi n-bi c
c kn 0
adde
t adde
t adde
t
r r r
s s( s s s s
kn - 1 k - 1) n 2n - 1 n n- 1 0
xn - 1 yn - 1 x1 y1 x0 y0
cn - 1 c1
cn FA FA FA 1
sn - 1 s1 s0
Most significant Least significant
bit(MSB) bit(LSB)
position position
n-bit adder/subtractor (contd..)
y y y
n- 1 1 0
Add/Su
bcontro
l
x x x
n- 1 1 0
c n-bit
n adder c
0
s s s
n- 1 1 0
s0
Sum Carr
yi
y
c
i
xi
xi
yi si c
c i +1
i
ci
x
i
yi
Computing the add time (contd..)
Cascade of 4 Full Adders, or a 4-bit adder
x0 y0 x0 y0 x0 y0 x0 y0
FA FA FA FA c0
c4 c3 c2 c1
s3 s2 s1 s0
We can write:
•All carries can be obtained 3 gate delays after X, Y and c0 are applied.
-One gate delay for Pi and Gi
-Two gate delays in the AND-OR circuit for ci+1
•All sums can be obtained 1 gate delay after the carries are computed.
•Independent of n, n-bit addition requires only 4 gate delays.
•This is called Carry Lookahead adder.
Carry-lookahead adder
x y x y x y x y
3 3 2 2 1 1 0 0
c4
c
3
c
2
c
1
. c
4-bit
carry-lookahead
B cell B cell B cell B cell 0
adder
s s s s
3 2 1 0
G3 P3 G2 P2 G P G P
1 1 0 0
Carry-lookahead logic
xi yi
. .
. c
i
Gi P i
si
Carry lookahead adder (contd..)
⚫ Performing n-bit addition in 4 gate delays independent of n
is good only theoretically because of fan-in constraints.
c1
6
4-bit
adder
c1
2 4-bit
adder
c8
4-bit
adder
c4
4-bit
adder
. c0
Carry-lookahead
logic
Multiplican
d
0 m3 0 m2 0 m1 0 m0
(PP0
) q0
0
PP p0
1 q1
0
r
lie
PP
tip
p1
ul
2 q2
M
0
PP p2
3 q3
0
,
p7 p6 p5 p4 p3
Category 1: Category 2:
Left Shift M : 11010 Right Shift PP : 00101
PP : 0101 M : 1101
ADD them ADD them
1 1010 0010 1
0101 1101
1 1111 1111 1
Sequential Circuit Multiplier
Register A (initially
0)
Shift right
C a a q q
n - 1 0 n - 1 0
Multiplier Q
Add/Noadd
control
n-bit
Adder
MUX Control
sequencer
0 0
m m
n - 1 0
Multiplicand M
Control Logic and Registers
n bit registers, 1 bit carry register C
Register set up
Q register ← multiplier
M register ← multiplicand
A register ← 0
C←0
1 0 0 1 1 1 1 0 1 Add
Shift Second cycle
0 1 0 0 1 1 1 1 0
0 1 0 0 1 1 1 1 0 No add
Shift Third cycle
0 0 1 0 0 1 1 1 1
1 0 0 0 1 1 1 1 1 Add
Shift Fourth cycle
0 1 0 0 0 1 1 1 1
Product
1101 x 1011
Signed Multiplication
11101 x 01011 = -13 x 11
Signed Multiplication: Rule 1
⚫ +VE Multiplier n -ve Multiplicand, extend the sign bit value of the
multiplicand to the left as far as the product will extend.
1 0 0 1 1 (- 1 )
0 1 0 1 1 ( + 13 )
1
1 1 1 1 1 1 0 0 1 1
1 1 1 1 1 0 0 1 1
Sign extension
isshown in 0 0 0 0 0 0 0 0
blue
1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 0 1 1 1 0 0 0 1 ( - 14 )
3
10011
01011
1111110 011
11111001 1
00000000
0010011
000000
10 0 0 0 0 1 1 1 0 0 0 1
Signed Multiplication : Rule 2
0 1 0 1 1 0 1
0 0 +1 +1 + 1+1 0
0 0 0 0 0 0 0
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 0 0 0 1 1 0
Booth Algorithm
M X 0011110[30]=M X ( 25 - 21) [30=32-2=25 -21]
0 1 0 1 1 0 1
0 +1 0 0 0 -1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
2's complement of
1 1 1 1 1 1 1 0 1 0 0 1 1
the multiplicand
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 1 0 1
0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 0 0 0 1 1 0
-13 x 11
+13= 01101 +11= 01011
-13= 10011
+11= +1-1+10-1
43 2 1 0
+2 -2 +2 -20 =16-8+4-1=11
4 3 2
1 0 01 1
+1 -1 +1 0 -1
00000 0 11 0 1
0000 0 00 0 0
111 100 1 1
000 11 01
110 01 1
1101 1 1 00 0 1
Ans is = -(2’comp of ( 1 0 1 1 1 0 0 0 1)
= - 10001111 = -143
Booth Algorithm
⚫ In general, in the Booth scheme, -1 times the shifted multiplicand is selected
when moving from 0 to 1, and +1 times the shifted multiplicand is selected
when moving from 1 to 0, as the multiplier is scanned from right to left.
0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0
0 +1 -1 +1 0 - 1 0 +1 0 0 - 1 +1 - 1 + 1 0 - 1 0 0
Booth recoding of a
multiplier.
Booth Algorithm
0 1 1 0 1 ( + 13) 0 1 1 0 1
X1 1 0 1 0 (- 6) 0 - 1 +1 - 1 0
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1 1
0 0 0 0 1 1 0 1
1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0 ( - 78)
1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0
Ordinar
multiplie
y
r 0 -1 0 0 +1 - 1 +1 0 - 1 +1 0 0 0 -1 0 0
0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1
Goo
multiplie
d
r 0 0 0 +1 0 0 0 0 -1 0 0 0 +1 0 0 -1
FlowChart for Booth’s Algorithm
Booth’s Algorithm(-7 x 3) +7=0111, -7=1001
Register Multiplier Multiplicand Operation Remark
Register Register
A Q Q-1 M
0000 0011 0 1001 Initially
0 0 −1 +1 −1 0
0 −1 −2
0 0 0 0 X M
0 0 1 +1 X M
0 1 0 +1 X M
0 1 1 +2 X M
1 0 0 −2 X M
1 0 1 −1 X M
1 1 0 −1 X M
1 1 1 0 X M
P7 P6 P5 P4 P3 P2 P1 57P0
Carry-Save Addition of Summands(Cont.,)
P7 P6 P5 P4 P3 P2 P1 P0
Carry-Save Addition of
Summands(Cont.,)
⚫ Consider the addition of many summands, we can:
Group the summands in threes and perform carry-save addition on
each of these groups in parallel to generate a set of S and C vectors in
one full-adder delay
Group all of the S and C vectors into threes, and perform carry-save
addition on them, generating a further set of S and C vectors in one
more full-adder delay
Continue with this process until there are only two vectors remaining
They can be added in a RCA or CLA to produce the desired product
Carry-Save Addition of Summands
1 0 1 1 0 1 (45 M
)
X 1 1 1 1 1 1 (63 Q
)
1 0 1 1 0 1 A
1 0 1 1 0 1 B
1 0 1 1 0 1 C
1 0 1 1 0 1 D
1 0 1 1 0 1 E
1 0 1 1 0 1 F
1 0 1 1 0 0 0 1 0 0 1 1 (2,835 Produc
) t
Figure 6.17. A multiplication example used to illustrate carry-save addition as shown in Figure
6.18.
1 0 1 1 0 1 M
x 1 1 1 1 1 1 Q
1 0 1 1 0 1 A
1 0 1 1 0 1 B
1 0 1 1 0 1 C
1 1 0 0 0 0 1 1 S
1
0 0 1 1 1 1 0 0 C
1
1 0 1 1 0 1 D
1 0 1 1 0 1 E
1 0 1 1 0 1 F
1 1 0 0 0 0 1 1 S
2
0 0 1 1 1 1 0 0 C
2
1 1 0 0 0 0 1 1 S1
0 0 1 1 1 1 0 0 C
1
1 1 0 0 0 0 1 1 S2
1 1 0 1 0 1 0 0 0 1 1 S
3
0 0 0 0 1 0 1 1 0 0 0 C3
0 0 1 1 1 1 0 0 C2
0 1 0 1 1 1 0 1 0 0 1 1 S4
+ 0 1 0 1 0 1 0 0 0 0 0 C
4
1 0 1 1 0 0 0 1 0 0 1 1 Produc
t
Figure 6.18. The multiplication example from Figure 6.17 performed using
carry-save addition.
Integer Division
Manual Division
2 1010
13 1
27 1
1101 10001001
4
2 0110
61 11000
4
1 0110
31 1 111
0
110
1 1
an an-1 a0 qn-1 q0
Dividend Q
A Quotient
Setting
0 mn-1 m0
Divisor M
0 0 0
tSubtrac 1 1 1 0 1 First cycle
t q0
Se 1 1 1 1 0
tRestor 1 1
e 0 0 0 0 1 0 0 0 0
10 Shif 0 0 0 1 0 0 0 0
11 1000 tSubtrac 1 1 1 0 1
11 tSe q 1 1 1 1 1 Second cycle
0
Restor
t 1 1
10 e 0 0 0 1 0 0 0 0 0
Shif 0 0 1 0 0 0 0 0
Subtrac
t 1 1 1 0 1
tSe q0 0 0 0 0 1 Third cycle
t
Shif 0 0 0 1 0 0 0 0 1
t
Subtrac 1 1 1 0 1 0 0 1
t q0
Se 1 1 1 1 1 Fourth cycle
tRestor 1 1
e 0 0 0 1 0 0 0 1 0
Remainder Quotient
Shift
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1 0 0 0
0 0 0 First cycle
Subtract 1 1 1 0 1
Set q0 1 1 1 1 0 0 0 0 0
Shift 1 1 1 0 0 0 0 0
Add 0 0 0 1 1 Second cycle
Set q 1 1 1 1 1 0 0 0 0
0
Shift 1 1 1 1 0 0 0 0
1 1 1 1 1 Add 0 0 0 1 1 Third cycle
Restore
0 0 0 1 1 Set q 0 0 0 0 1 0 0 0 1
remainder 0
Add 0 0 0 1 0
Remainder Shift 0 0 0 1 0 0 0 1
Subtract 1 1 1 0 1 Fourth cycle
Set q 1 1 1 1 1 0 0 1 0
0
Quotient
A nonrestoring-division example.
Floating-Point Numbers
and
Operations
Why floating point numbers?
The maximum value that can be represented using 32 bits is 4,294,967,295, is an
integer equal to 232 − 1.
But for representing such numbers 32 bits are not sufficient enough, we need larger
number of bits in our conventional methods.
Solution:
IEEE 754 Standard: Single Precision(32 bits)
Double Precision(64 bits)
Introduction to basic terms
Biased exponent says that we donot want our exponent as a signed number, we want it
only as a positive number.
So to get only +ve values, we are using biased exponent, where we will add bias to the
actual exponent and the result will be stored as the exponent of the number.
So we are going to take bias as 2k-1-1 , (27-1=127)if exponent is represented using k(8)
no of bits.
Then to perform implicit normalization we need to bring the decimal point to the left
of 1st one in the number.
1001.01= 1.00101 x 23
Steps to convert a given decimal number into IEEE Format
Step1: Convert the number into Binary.
Step 4: Add 127 to the exponent , then write the binary of it for E’
S=0
E’= 10000011
M=11000....................00
S=0
E’= 10000000
M=11000....................00
S=0
E’= 00000000
M=11000....................00≠0=>Denormalized number
S=0
E’= 11111110
M=11111..................11
S=1
E’= 11111110
M=11111..................11
S=0
E’= 00000001
M=0000.....0000
S=1
E’= 00000001
M=0000.....0000
1.1 x 2-130
M=1
E’=-130+127=-3
0.0011 x 2-126
M=0011
E’=00000000
This is an example of denormalized number
This is because the IEEE uses the exponents -127 and 128 (and -1023 and
1024), that is the actual values 0 and 255 to represent special conditions:
- Exact zero
- Infinity
Floating point Arithmetic
Addition:
3.1415 x 108 + 1.19 x 106 = 3.1415 x 108 + 0.0119 x 108 = 3.1534 x 108
Multiplication:
3.1415 x 108 x 1.19 x 106 = (3.1415 x 1.19 ) x 10(8+6)
Division:
3.1415 x 108 / 1.19 x 106 = (3.1415 / 1.19 ) x
10(8-6)
Biased exponent problem:
If a true exponent e is represented in excess-p notation, that is as e+p.
Then consider what happens under multiplication:
Number 1: Number 2:
1.0001111 x 25 1.01001 x 24
Choose the number with the smaller exponent. and shift its mantissa right until the
exponents of both the numbers are equal. So number 2 will become
0.1010010 x 25 and now perform the addition
1.0 0 0 1 1 1 1 x 25
0.1 0 1 0 0 1 0 x 25
1.1 1 0 0 0 0 1 x 25
Floating point arithmetic: MUL rule
⚫ Add the exponents.
⚫ Subtract the bias.
⚫ Multiply the mantissas and determine the sign of the
result.
⚫ Normalize the result (if necessary).
⚫ Truncate/round the mantissa of the result.
Floating point arithmetic: DIV rule
Consider three registers R1, R2, and R3 that store numbers in IEEE−754 single
precision floating point format. Assume that R1 and R2 contain the values (in
hexadecimal notation) 0x42200000 and 0xC1200000, respectively.
R1, S=0
E’=10000100 = 132-127=5
M= 1.010000000......0
R2= 0xC1200000
R2= 1100 0001 0010 0000 .......................0000
R2, S=1
E’=1000 0010 = 130-127=3
M= 1.010000000......0
R2, 1.0100 x 23 = 1010=-10
So, R1/R2= 40/-10= -4
-4= 100.0 X 20 = 1.00 x 22, S=1, E’=1000 0001, M=0000000.......0
(A) 5
(B) 6
(C) 4
(D) 7
GATE QUESTIONS
If there are m input lines n output lines for a decoder that is used to uniquely
address a byte addressable 1 KB RAM, then the minimum value of m+n is
________ .
(A) 18
(B) 1034
(C) 10
(D) 1024
GATE QUESTIONS
A computer system with a word length of 32 bits has a 16 MB
byte-addressable main memory and a 64 KB, 4-way set associative cahce
memory with a block size of 256 bytes. Consider the following four physical
addresses represented in hexadecimal notation.
NO OF SETS IN Cache= 256/4=64=26 I.E., 6 BITS FOR SET, 8 bits for word
Bindu Agarwalla
1
Peripherals ??
Input Device
Output Device
CPU
Interface I/O
Why do we need a interface??
Data codes and format in peripherals differ from the word format in the CPU and
memory. So conversion of formats is required.
The operating modes of peripherals are different from each other and each must be
controlled so a peripheral does not distrub the operation of other peripherals.
I/0 Interface
I/O device is connected to the bus using an I/O interface circuit which has:
- Address decoder, control circuit, and data and status registers.
I/0 Interface
Address decoder decodes the address placed on the address lines thus enabling the
device to recognize its address.
Data register holds the data being transferred to or from the processor.
Status register holds information necessary for the operation of the I/O device.
Data and status registers are connected to the data lines, and have unique addresses.
Separate Bus
I/O Mapped I/O or Isolated I/O: I/O devices and memory have separate
address space.
Common Address, Data Bus and
Control Bus
Memory Mapped I/0: From the available memory address space, some
addresses are assigned to I/O devices.
Memory Mapped I/O vs I/O Mapped I/O
DMA
Program Controlled I/O
Processor repeatedly monitors a status flag to achieve the necessary
synchronization.
DATAIN Register
SIN
STATUS Register
Program Controlled I/O
Example: Reading a line from a keyboard, characterwise and store it in
memory location starting from LINE. Need to stop the reading process once
‘enter’ key is pressed and call a function named PROCESS1 to process the
i/p.
MOVE DATAIN, R1 // Read char from the Datain reg of the interface to R1
MOVE R1, (R0)+ // Read char is moved to mem loc pointed by R0
COMPARE #$0D, R1 // Char typed is checked for ascii code of ‘enter’ key
BRANCH≠0 WAITK // If the pressed key is not ‘\n’, then continue reading
CALL PROCESS1 //If the pressed key is ‘\n’, then call the function Process1
Interrupt Driven I/O
Bindu Agarwalla
14
Interrupt Driven IO
Polling method – Processor waits for response from I/O device. During
wait period processor not able perform useful computation.
The I/O devices can alert the processor when it becomes ready. It can do
so by sending a hardware signal called an interrupt request to the
processor.
Interrupt Example 1
When interrupt occurs during execution of program processor register used, flag
status information must saved in stack before execution of the interrupted program
is resumed.
In this way, the original program can continue execution without being affected in
any way by the interruption, except for the time delay.
The task of saving and restoring information can be done automatically by the
processor or by program instructions.
Sequence of events in response to an
Interrupt Request
Bindu Agarwalla
18
Interrupt Driven IO
The process of saving and restoring registers involves memory transfers that increase
the total execution time, and hence represent execution overhead.
Saving registers also increases the delay between the time an interrupt request is
received and the start of execution of the interrupt-service routine.
2. The processor COMPLETES the execution of the current instruction and the
program currently being executed is interrupted and saves the contents of the PC and
status/flag register.
[3. Interrupts are disabled by clearing the IF bit in the status/flag register to 0.]
5. Upon completion of the interrupt-service routine, the saved contents of the PC and
Status registers are restored (enabling interrupts by setting the IF bit to 1), and
execution of the interrupted program is resumed.
Interrupt Hardware
INTR= INTR1+INTR2+…..+INTRn
It is essential to ensure that this active request signal does not lead to successive
interruptions, causing the system to enter an infinite loop from which it cannot
recover.
In some situations interrupt have to be ignored e.g. PRINT interrupt from the
printer cannot be serviced by the the processor if COMPUTE is not ready with
text to print.
Enabling and Disabling of Interrupts
First Possibility
Ignore the Interrupt Request until the completion of the current ISR
The processor hardware ignores the interrupt request line until the execution of
the first instruction of the ISR has been completed.
Ignore the Interrupt Request until the completion of the current ISR
i.e. first instruction of ISR is Interrupt Disable and last instruction is Interrupt
Enable
ISR
DI
.....
......
.....
EI
IRET
Enabling and Disabling of Interrupts
Second Possibility:
The processor automatically disables interrupts before starting the execution of
the ISR.
On entry to an ISR
1. Processor first saves the contents of the program counter (PC) and the
processor status (PS) register with IF=1 on stack
2. Automatically disable interrupts before starting the execution of the
interrupt-service routine.
On Exit from an ISR
When return from interrupt instruction is executed the contents of PC and PS is
popped with IF=1 from stack.
Enabling and Disabling of Interrupts
[3rd Method]
INTR line must accept only at the leading edge of the signal (Edge triggered line)
Processor receives only one request regardless of how long the line is activated.
Bindu Agarwalla
26
Handling Multiple Requests
When a number of devices can send interrupt requests on a common line INTR
connected to the processor, How can the processor determine which device is
requesting an interrupt?
Solution: POLLING
When an interrupt request is received it is necessary to identify the particular device
that has raised the request.
The simplest way to identify the interrupting device is to have the ISR poll all the I/O
devices connected to the bus.
The first device encountered with it’s IRQ bit set is the device that should be serviced.
Handling Multiple Requests
The Polling method is easy to implement.
The main drawback is the time spent in interrogating the IRQ bits of all the devices
that may not be requesting any service.
A device requesting an interrupt can identify itself if it has its own interrupt-request
signal, or if it can send a special code (like memory address of ISR) to the
processor
A commonly used scheme is to allocate permanently an area in the memory to hold the
addresses of interrupt-service routines. These addresses are usually referred to as
interrupt vectors, and they are said to constitute the interrupt-vector table.
Handling Multiple Requests
When an interrupt request arrives, the information provided by the requesting
device is used as a pointer into the interrupt-vector table, and the address in the
corresponding interrupt vector is automatically loaded into the program counter.
Some Points
IO devices send interrupt vector code over the data bus.
When a device sends an interrupt request, the processor may not be ready to receive the
interrupt-vector code.
So, when the processor is ready to receive the interrupt vector code, it sends the
INTA signal to the device.
Only after receiving the INTA signal, the device places the interrupt vector
code on the data bus.
Nesting of Interrupt Requests
and
Simultaneous Interrupt Requests
Bindu Agarwalla
31
Interrupt Nesting
If request comes from more than one device:
Sometimes some device need immediate response from processor e.g. System
Clock, Real time system.
Interrupt requests from higher-priority devices will be accepted.
Interrupt requests will be accepted from some devices but not from others, depending
upon the device’s priority. To implement this scheme, we can assign a priority level to
the processor that can be changed under program control.
The processor’s priority can be encoded in a few bits of the processor status register.
Interrupt Nesting
Multiple priority scheme implemented using individual INTR and INTA lines.
Priority arbitration circuit: A logic circuit which combines all interrupts but
allows only the highest-priority request.
Interrupt Nesting
Two types of Priority Scheme: Fixed Priority: Lowe number indicates higher priority.
Rotating Priority: Once one device is serviced, that device will become the lowest
priority device and the device next in sequence will become the highest priority
device.
Handling Multiple Devices
[Simultaneous Requests]
Daisy Chain
INTA is connected in daisy chain fashion where INTA signal propagates serially
through the devices.
Handling Multiple Devices
[Simultaneous Requests]
When IR is active on INTR line then INTA is sent to Device 1 and passes
to device 2 if it does not require any service.
In daisy chain, the device that is electrically closest to the processor will
have the highest priority.
Handling Multiple Devices
Combining the Multiple-priority and daisy chain.
Where device organized in group and each have different priority level.
37
Thank You.
Good Luck
Direct Memory Access
To transfer large blocks of data at high speed, an alternative approach is used.
DMA transfers are performed by a control circuit that is part of the I/O device
interface called DMA controller.
For each word transferred, processor provides the memory address and all
the bus signals that control data transfer.
Since DMA controller has to transfer blocks of data, the DMA controller must
increment the memory address for successive words and keep track of the number
of transfers.
Registers used in DMA Operation
DMA controller has number of registers that are accessed by the processor to initiate
transfer operations.
Registers used in DMA Operation
One register is used for storing the Starting address to/from which the
communication will take place.
The word count register is used for storing the COUNT of units to be transferred.
When the controller has completed transferring a block of data and is ready to
receive another command, it sets the Done flag to 1.
Bit 30 is the Interrupt-enable flag, IE. When this flag is set to 1, it causes the
controller to raise an interrupt after it has completed transferring a block of data.
The controller sets the IRQ bit to 1 when it has requested an interrupt.
DMA Operation
DMA Transfer
When transfer of data between peripheral and memory is needed, the peripheral
places it’s request to the DMA controller attached to it.
The DMA controller sends Bus Request(HOLD) signal to the CPU for releasing
the control of the system bus.
The CPU initializes the DMA controller by sending the following information
through the data bus.
1. The starting address of the memory block where the data are available (for read)
or where the data need to be stored (for write operation).
2. The word count which is the number of bytes in the memory block.
It continues to transfer data between memory and peripheral unit until the entire
block is transferred.
For each transfer, memory address is incremented and the word count is
decremented.
After the transfer is over, the DMA Controller sends an interrupt request to the
processor.
In response to this interrupts, processor takes back the control on the system bus.
Modes of DMA Transfer: Cycle
Stealing and Burst Mode
In Cycle Stealing, the DMA Controller takes the control of buses from CPU for
transferring one word of data in one cycle and returns the control to the CPU.
Alternatively, the DMA controller may be given exclusive access to the main
memory to transfer a block of data without interruption. This is known as block or
burst mode.
47
Problem [GATE 2016]
The size of the data count register of a DMA controller is 16 bits.
The processor needs to transfer a file of 29, 154 kilobytes from disk to main
memory.
The memory is byte addressable. The minimum number of times the DMA
controller needs to get the control of the system bus from the processor to
transfer the file from the disk to main memory is...............
Solution:
Using 16bit count register , the maximum value of count will be 26 x 1024 Bytes
The designer of the system also has the alternative approach of using DMA
controller to implement the same transfer. the DMA controller requires 20 clock
cycle for initialization and other overheads. Each DMA transfer Cycle takes two
Clock cycle to transfer One byte of data from the device to memory
49
Problem [GATE 2016]
What is the approximate speed up when the DMA controller based design is
used in place of interrupt-driven program based input-output?
Solution:
With Interrupt technique,
The time taken is: 1+1 +(2+2+1+1+1) x 500 CC= 3502CC