COA - Bindu Agarwalla Notes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 580

Course Handout, Computer Architecture, CS-2006, Sec: CSSE-2

Faculty Name:Bindu Agarwalla


No.
of Rema
Day Chapter Topic
lectur rk
e
Day-1 Basic Introduction, Computer Types 1
Day-2 Structure CA and CO and their relationship 1
Day-3 of Von-Neumann Vs Harvard concept 1
Computer Functional units, Basic operational concepts 1
Day-4 s Bus Structures and Types
Day-5 Memory location and Addressing mechanism 1
Day-6 Big- and Little-Endian schemes 1
Day-7 Memory operations, Instruction and instruction sequencing 1
Instruction Format, Instruction length (0,1,2,3 address) with 1
Day-8
problem
Day-9 Different CPU organization 1
Day- Machine Addressing modes 1 1
10 Instructio
Day- ns and Addressing modes 2 1
11 Programs
Day- Assembly Language 1
12
Day- Basic Input and Output Operations, Subroutines 1
13
Day- Additional Instructions (Logic and Shift/Rotate Instructions) 1
14
Day- Fundamental concept, Steps taken by CPU 1
15
Day- Single bus CPU organization, Execution of a complete instruction 1
16
Day- Control signals required for an instruction 1
17
Basic
Day- Multiple bus CPU organization 1
Processing
18
Unit
Day- Design of control unit: Hardwired 1
19
Day- Design of control unit: Micro programmed 1
20
Day- TUTORIAL/ACTIVITY 1
21
Day- Memory Basic concepts, Memory hierarchy and it’s need 1
22 Organizati
Day- on Parameters used to measure the performance. 1
23
Day- Types of memory components., Semiconductor RAM memories 1
24
Day- Memory Module Design, 1
Course Handout, Computer Architecture, CS-2006, Sec: CSSE-2
Faculty Name:Bindu Agarwalla
25
Day- ROM 1
26
Day- Cache memories 1
27
Day- Mapping functions 1
28
Day- Replacement algorithms 1
29
Day- Memory Interleaving 1
30
Day- Memory performance consideration 1
31
Day- Virtual memory organization. 1
32
Day- TUTORIAL/ ACTIVITY 1
33
Day- Design of Adder (n-bit ripple carry adder, carry look ahead adder) 1
34
Day- Multiplication of Positive Numbers 1
35
Day- Signed Operand Multiplication 1
36
Day- Fast Multiplication 1
37
ALU
Day- Integer Division (Restoring and non-restoring) 1
38
Day- IEEE Floating-point Numbers and its Operations (Single and 1
39 double precision)
Day- TUTORIAL/ ACTIVITY 1
40
Day- TUTORIAL/ ACTIVITY 1
41
Day- I/O Basics of I/O operations 1
42 Organizati
Day- on Accessing I/O Devices 1
43
Day- Memory mapped I/O and I/O mapped I/O 1
44
Day- Interrupts 1
45
Day- DMA 1
46
Day- Interface Circuits 1
47
Day- Standard I/O Interfaces -PCI Bus, SCSI Bus, USB 1
Course Handout, Computer Architecture, CS-2006, Sec: CSSE-2
Faculty Name:Bindu Agarwalla
48
Day- Flynn’s Classification (SISD,SIMD,MISD,MIMD) 1
49
Day- RISC vs CISC 1
50

Syllabus
CS 2006 COMPUTER ORGANIZATION AND ARCHITECTURE Cr- 4

Basic Structure of Computers: Computer Types, Functional Units, Basic Operational


Concepts, Bus Structures, Machine Instructions and Programs: Memory Location and
Addressing mechanism, Memory Operations, Encoding of machine instructions, Address
modes, Instructions, Instruction formats, Instruction length, Assembly Language,
Subroutines, Additional Instructions, RISC vs CICS.
Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction,
Single and Multiple Bus Organization, Hardwired Control, Micro programmed Control unit.
Arithmetic: Design of fast adders, Multiplication of Positive Numbers, Signed Operand
Multiplication, Fast Multiplication, Integer Division, Floating-point Numbers and Operations.
Memory System: Basic Concepts, Semiconductor RAM Memories, Read Only Memories,
Speed, Size, and Cost, memory module design, Cache Memories – Mapping Functions,
Replacement Algorithms, Memory interleaving, Memory Performance Considerations
Virtual Memories.
Input/ Output Organization: Basic Input and Output Operations, Accessing I/O Devices,
Interrupts – Interrupt Hardware, Enabling and Disabling Interrupts, Handling Multiple
Devices, Controlling Device Requests, Exceptions, Direct Memory Access. Interface
Circuits, Standard I/O Interfaces – PCI Bus, SCSI Bus, USB, Flynn’s Classification, RISC vs
CISC
Case Study: IA-32 Register structure, IA-32 addressing modes, IA-32 Instructions,
Instruction format, IA-32 Assembly language, Program flow, Logic and shift/Rotate
Instructions for IA-32, Programming examples.
Text Book
1. Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Computer Organization, TMH, 5th
Edition, 2002.
2. M. Morris Mano, Computer System Architecture, Pearson Education India, 3rd Edition
Reference Book
1. Computer Organization & Architecture, William Stallings, 7th Edition, PHI, 2006.
2. John P. Hayes, Computer Organization & Architecture, TMH
Evaluation Methodology:

Activities: 30 Marks
Course Handout, Computer Architecture, CS-2006, Sec: CSSE-2
Faculty Name:Bindu Agarwalla
Mid Semester Exam: 20 Marks
End Semester Exam: 50 Marks
__________
Total 100 Marks
Computer Organization and
Architecture(CS 2006)

Bindu Agarwalla

1
COA
1. Computer??
2. Organization??
3. Architecture??

A Computer is an electronic device that accepts input from the outside world and
processes them according to some predefined instructions and produces output for the
outside world.
Computer
1. Input Devices
2. Memory
3.Processor
4. Output Devices
Memory
2. Memory
Primary Memory: RAM and ROM
Meaning of Memory capacity
bit
byte
nibble
word
Secondary memory

A processor is connected to a 128GX32 memory module. What is the width


of its address bus and data bus?
Processor
2. CPU
ALU
REGISTER SET
CONTROL UNIT

One instruction requires 7 clock cycle to complete its execution. How much time
is required for that instruction if the processor speed is 5 GHz?
Registers
1. Dedicated:
PC
MAR
MDR
IR
SP

PC: Program counter. It points to the next instruction to be executed.

MAR: Memory Address Register. It contains the address of the memory


location from where any read/ write operation is going to take place.

MDR: Memory Data Register. It contains the data that is being read from
memory/ the data that is being written into memory during a write operation.

IR: Instruction Register. It contains the current instruction being executed.


Registers
SP: Stack Pointer: It points to the top of the stack. stack grows
downward in the memory. For push operation, SP is first
decremented and for pop operation, SP is incremented after the
operation.

General Purpose Registers:


R0.......RN-1
These are programmer visible registers that can be used as operands in an
instruction.
Basic Operational Concept
How an instruction is being executed?
Basic Operational Concept
Instruction Cycle: It consists of:
a. Fetch the instruction pointed by PC into IR
b. Decode the instruction.
c. Fetch the operands (if required)
d. Execute the instruction.
e. Store the result into memory( if required)
Here steps <c> and <d> are optional.

a.Fetch the instruction pointed by PC into IR:


i. [PC]→ MAR
ii. Generate the 'Read' Signal.
iii. Wait for the MFC(Memory Function to Complete)
iv. [MDR]→ IR

b. Decode the instruction: Decoder is connected to IR. Hence, the


instruction is decoded immediately and what is the operation to be
performed is found out.
Basic Operational Concept
Example1 : ADD R1, R2
Steps:
1. [PC]→ MAR
2. Generate the 'Read' Signal.

3. Wait for the MFC(Memory Function to Complete)

4. [MDR]→ IR
5. Decode the instruction.

6. Execute the instruction.


ALU will perform the addition on the contents of R1 and R2 and
result of addition will be stored into R2.
Basic Operational Concept
Example1 : ADD A, R1
Steps:
1. [PC]→ MAR
2. Generate the 'Read' Signal.

3. Wait for the MFC(Memory Function to Complete)

4. [MDR]→ IR
5. Decode the instruction.

6. Fetch the operand stored at memory location A


I. Address part of IR→ MAR
II. Generate the 'Read' Signal.
III. Wait for the MFC(Memory Function to Complete)
IV. [MDR]→ Input of ALU
Basic Operational Concept
Example1 : ADD A, R1
Steps:[Continued]

7. Execute the instruction.


ALU will perform the addition on the contents of memory
location A and R1 and result of addition will be stored into R1.

How many times the memory is referred to execute the instruction:


ADD A, R1

2 times.
1. To fetch the instruction.
2. To fetch the operand from memory location A
Basic Operational Concept
Example1 : ADD R1, A
Steps:
1. [PC]→ MAR
2. Generate the 'Read' Signal.

3. Wait for the MFC(Memory Function to Complete)

4. [MDR]→ IR
5. Decode the instruction.

6. Fetch the operand stored at memory location A


I. Address part of IR→ MAR
II. Generate the 'Read' Signal.
III. Wait for the MFC(Memory Function to Complete)
IV. [MDR]→ Input of ALU
Basic Operational Concept
Example1 : ADD R1, A
Steps:[Continued]

7. Execute the instruction.


ALU will perform the addition on the contents of memory
location A and R1.

8. Store the result into the memory location A.


I. Address part of IR→ MAR

II. MDR← Result from ALU

III. Generate the 'Write' Signal.

IV. Wait for the MFC(Memory Function to Complete)


Basic Operational Concept
How many times the memory is referred to execute the instruction:
ADD R1, A

3 times.
1. To fetch the instruction.
2. To fetch the operand from memory location A
3. To store the result into memory location A

How many times the memory is referred to execute the instruction:


ADD R1, R2
Answer: 1

How many times the memory is referred to execute the instruction:


MOV R1, A
Answer: 2
How many times the memory is referred to execute the instruction:
MOV A, R1
Answer: 2
Bus Structures
What is Bus??
Types:(Based on Number)
Single Bus
Multibus

Types:(Based on type of information carrying)


Address Bus
Data bus
Control Bus
Bus Structures
Types:(Based on Connection)
Internal Bus
External bus
Concept of Interrupt
INTR
INTA
ISR
Questions
A processor is connected to a 4G×32 bit memory module . A program is
kept in 100th address of the memory and the maximum length of each
instruction of the program is of 32 bits. Then find out size of MAR,MDR
, IR and also the content of PC?

A processor has 48-bit instructions composed of two fields: the first two bytes
contain the opcode and the remainder a memory operand address.How many bits
are needed for the Program Counter and the Instruction Register ?
Questions
a)At the end of a memory read operation, the MDR is loaded with a
binary combination, how that combination is interpreted as an instruction
or an operand to an instruction?

If the memory operation is initiated by sending the contents of PC to


MAR, then the content of MDR will be interpreted as an instruction else
as an operand.
Thank You
CPU Organization

Bindu Agarwalla

1
CPU Organization
1.Single Accumulator Organization (One Address Instruction)

2. General Register Organization (Two and Three Address Instruction)

3.Stack Organization (Zero Address Instruction)

To understand the topic we will take an example and will discuss all the
organizations.

X=(A+B) * (C+D)
CPU Organization
1.Three Address Instruction:X=(A+B) * (C+D)

General format:
Opcode destination, src1, src2
Note: The order of operands vary from architecture to architecture.

ADD R1, A, B //R1←Mem[A] +Mem[B]

ADD R2, C, D //R2←Mem[C] +Mem[D]

MUL X, R1, R2 //Mem[X]←[R1] * [R2]


CPU Organization
1.Two Address Instruction: X=(A+B) * (C+D)

General format:
Opcode src1/dest, src2
Note: The order of operands vary from architecture to architecture.

MOV R1, A //R1←Mem[A]

ADD R1, B //R1←[R1] +Mem[B]

MOV R2, C //R2←Mem[C]

ADD R2, D //R2←[R2] +Mem[D]

MUL R1, R2 //R1←[R1] * [R2]

MOV X,R1 //Mem[X]←[R1]


CPU Organization
1.One Address Instruction: X=(A+B) * (C+D)

This uses accumulator(AC) register for all data manipulations. Here AC


is assumed to be one of the operands for all the instructions.
LOAD: from memory to accumulator
STORE: From accumulator to memory

LOAD A //Acc←Mem[A]

ADD B //Acc←[Acc] +Mem[B]

STORE T //Mem[T] ← [Acc]

LOAD C //Acc←[Mem[C]

ADD D //Acc←[Acc] +Mem[D]

MUL T //Acc←[Acc] *Mem[T]


STORE X //Mem[X]←[Acc]
CPU Organization
1.Zero Address Instruction: X=(A+B) * (C+D)

A stack-organized computer doesn't use the address field for the


instructions like ADD, MUL, XOR etc. Operand is specified only for
Push and Pop operations.The top two contents from the stack is popped
out and the operation is performed and then the result is pushed back
onto the stack.
PUSH A //Stack[top]←Mem[A]
PUSH B //Stack[top]←Mem[B]
ADD //Stack[top-1]←Stack[top-1]+Stack[top]
PUSH C //Stack[top]← [Mem[C]

PUSH D //Stack[top]← Mem[D]


ADD//Stack[top-1]←Stack[top-1]+Stack[top]
MUL //Stack[top-1]←Stack[top-1]*Stack[top]
POP X //X←Stack[top]
CPU Organization
Example 2:
X:=(P+Q) * R / S + (T-U) / V
CPU Organization
1.THREE Address Instruction: X:=(P+Q) * R / S + (T-U) / V

ADD R1, P, Q
MUL R1, R1, R
DIV R1, R1, S
SUB R2, T, U

DIV R2, R2, V

ADD X, R1, R2
CPU Organization
1.TWO Address Instruction: X:=(P+Q) * R / S + (T-U) / V

MOV R1, P
ADD R1, Q
MUL R1, R
DIV R1, S

MOV R2, T

SUB R2, U
DIV R2, V
ADD R1, R2

MOV X, R1
CPU Organization
1.One Address Instruction: X:=(P+Q) * R / S + (T-U) / V

LOAD P
ADD Q
MUL R
DIV S

STORE TEMP

LOAD T
SUB U
DIV V

ADD TEMP
STORE X
CPU Organization
1.ZERO Address Instruction: X:=(P+Q) * R / S + (T-U) / V

PUSH P
PUSH Q
ADD
PUSH R

MUL

PUSH S
DIV
PUSH T

PUSH U
SUB ADD
PUSH V POP X
DIV
CPU Organization
1RISC Instruction: X:=(P+Q) * R / S + (T-U) / V

LOAD R1, P

LOAD R2, Q
ADD R1, R1, R2
LOAD R2, R

MUL R1, R1, R2

LOAD R2, S
DIV R1, R1, R2
LOAD R2, T

LOAD R3, U
SUB R2, R2, R3 ADD R1, R1, R2
LOAD R3, V STORE X, R1
DIV R2, R2, R3
Organization Vs Architecture
Computer Architecture is the design of the system, visible to the
assembly language programmer

What is the instruction set


How many registers
Memory addressing scheme/ addressing modes

Organization is how the architecture is implemented

How much cache memory

Implementation technology

All computers in the Intel Pentium series have the same architecture
but each version of Pentium has a different organization or
implementation
Von Neuman's Stored Program Concept
John Von Neumann has invented a m/c in Institute of Advanced Studies
in 1945 to 1952 which is named as stored program digital computer.

It keeps its programmmed instructions as well as data in the same RAM.

Parallelism( Piplining) is not supported by this architecture.

Harvard Architecture.
Basic Performance Equation
Program execution time (T) = (N X S)/R

Where, T is the execution Time

N is the no of instructions in execution

S is the average no of basic steps needed to execute one machine instruction.

R is the clock rate R=1/f


Problem
Discuss the factors that affect the performance of the computer. If a
8GHz computer takes 7 clock cycles for ALU instructions, 11 clock
cycles for branch instructions and 6 clock cycles for data transfer
instructions. Then Find the total time taken by the computer to execute
the program that consists of 10 ALU instructions, 5 branch instructions
and 5 data transfer instructions.
Problem
Also explain how an instruction SUB R0, LocA; (meaning [R0] -
[LocA]=[LocA] ) is executed by the processor with neat diagram.
Problem
e)A computer uses a memory unit with 256K words of 32 bits
each. A binary instruction code is stored in one word of memory.
The instruction has four parts: an indirect bit, an operation code, a
register code part to specify one of 64 registers, and an address
part. How many bits are there in the operation code, the register
code part, and the address part?

Mode Op-code Register Address

256 K = 28 × 210 = 218


Address = 18 bits

Mode = 1 bits
Register = 6 bits

Total=25 bits
op code =32-25=7 bits
Problem
a)A computer has 64-bit instructions and 12 bit addresses. If
there are 352 three-address instructions, and 2256 no of
two-address instructions then how many one-address instructions
can be formulated?

Maximum Possible 3 address instructions are:228.

Maximum possible 2 address instructions are: (228 – 352) X 212

So, Maximum possible 1 address instructions are: ((228 – 352) X


212-2256) X 212
Problem
b)PC does the same function as MAR, and then justify your
answer by keeping two registers instead of one.
PC always holds the address of the instruction which is being
currently executed. During the fetch phase of the instruction, the
content of PC is incremented, which points to the next instruction
of the program. If there is only PC in the processor, then during
the execution phase of the instruction, for fetching the operand
the same PC can be used and the content of PC which holds the
address of next instruction, is lost. Similarly if only MAR is
available in the processor, then same problem occurs. Hence at
any cost, two registers must be in the processor to help the
execution of the program.
Basic Operational Concept
How an instruction is being executed?
Write a program to evaluate an expression
X=((A+B)*C)/(D-E*F+G*H) using 3-Address and also write
another program to evaluate the same expression using 0-Addess
instructions only. In the expression X,A,B,C,D,E,F,G and H are
memory addresses
Thank You
Addressing Modes
Bindu Agarwalla

1
Addressing Modes
Consider the following program segment. Here R1, R2 and R3 are the
general purpose registers. Instruction Operation MOV
R1, (3000) R1←M[3000] LOOP: MOV R2, (R3)
R2←M[R3] ADD R2, R1 R2←R1+R2 MOV (R3), R2
M[R3] ←R2 INC R3 R3←R3+1 DEC R1
R1←R1-1 BNZ LOOP Branch on not zero HALT
Stop Assume that the content of memory location 3000 is 10 and the
content of the register R3 is 2000. The content of each of the
memory locations from 2000 to 2010 is 100. The program is loaded
from the memory location 1000. All the numbers are in decimal.
Assume that the memory is word addressable. How many number of
memory references for accessing the data in executing the program
completely?
Addressing Modes
The addressing mode refers to the way in which the operand of an
instruction is specified.

Implied Mode: In this mode the operands are specified implicitly in the
definition of the instruction.

Ex. COM // Complement accumulator

No of memory references to execute this instruction is??


ONE
Addressing Modes
Immediate Mode: In this mode, the operand is specified as part of the
instruction itself.

MOV #200, R1

Here the value 200 is moved to the register R1.

# is used to indicate an immediate operand. As a number may represent an


address also.

No of memory references to execute the above instruction is??

ONE
Addressing Modes
Register Mode: In this mode, the operand is specified as the content of a
general purpose register.
Ex: MOV R1,R2

No of memory references to execute the above instruction is??

ONE
Addressing Modes
Direct Mode
In this mode, the operand is there in the memory, the address of the
operand is specified in the instruction only.
Ex: MOV NUM , R2
MOV 2000, R2
Note : The address can be specified either as a numeric value or as a symbolic
one.

R2

56

2000 56 NUM
No of memory references to execute the above
instruction is??

two
Memory
Addressing Modes
Register Indirect Mode

In this mode, the operand is there in the memory, whose address is


specified as the content of a register.

Ex: MOV (R1), R2 // R2← mem[R1]

R1 R2
1000 56

1000 56
No of memory references to execute the above
instruction is??

two
Memory
Addressing Modes
Memory Indirect Mode:
In this mode, the operand is there in the memory, whose address is specified as the
content of another memory location in the instruction. i.e., the address of the address
of the operand is specified in the instruction.

Ex: MOV (500), R2 // R2← mem[mem[500]]


or,
MOV (num), R2 // R2← mem[mem[num]]

R2
500 1000 num
56

1000 56
No of memory references to execute the above
instruction is??

three
Memory
Problem
Program to add N numbers stored in memory location starting from
NUM1 LOCATION

MOVE N, R1
MOVE #NUM1, R2
CLEAR R0
LOOP ADD (R2), R0
ADD #4, R2

DECREMENT R1
BRANCH >0 LOOP
MOVE R0, SUM

H/ W: A=*B into assembly language code


Index Mode
In this mode, the operand is there in the memory, whose address is the
sum of the offset and content of index register. Offset is specified in the
address of the instruction. offset represents relative displacement. i.e.,
how far the operand is located from the base.

Ex: MOV 20(R1), R2 // R2← mem[20+[R1]]

Note: In the effective address generation, index register content is not modified. It
is only used in the process.

R1 R2
1000 1000
78

No of memory references to execute the above


78 instruction is??
1020
two
Memory
Index Mode: Where to use??

N n
LIST student id
LIST+4 test 1 Student 1
LIST+8 test 2
LIST+12 test 3
student
Student 2
test 1
test 2
test 3
.
.
If the 1st student data is stored in memory from location 1000, then the next student
data will found at location 1016.
Problem
Program to add the average of score of three tests for a class having
N number of students.

MOVE #LIST, R0
CLEAR R1
CLEAR R2
CLEAR R3
MOVE N, R4

LOOP ADD 4(R0), R1


ADD 8(R0), R2
ADD 12(R0), R3 MOVE R1, SUM1
ADD #16, R0 MOVE R2, SUM2
Decrement R4 MOVE R3, SUM3
BRANCH > 0 LOOP
Addressing Modes
Index Mode:

Another varaiants:
1. (Ri,Rj)
EA=[Ri] +[Rj]

2.
X(Ri,Rj)
EA=X+[Ri] +[Rj]
Addressing Modes
Relative Mode: In this mode, the operand is there in the memory, whose address
is the sum of the offset and content of index register. Offset is specified in the
address of the instruction. offset represents relative displacement. i.e., how far the
operand is located from the base.
Ex: Branch > 0 LOOP

R2
500 1000 num
56

1000 56
No of memory references to execute the above
instruction is??

three
Memory
Relative Mode
In this mode, the effective address is generated using offset and the
contents of PC Program Counter.
Ex: Branch > 0 LOOP

MOVE N, R1
MOVE #NUM1, R2
Here, when the branch instruction is
CLEAR R0 executed, that time, the value of PC
will be 1016(address of the next
1000: LOOP ADD (R2), R0
instruction), so from that value of PC,
1004 ADD #4, R2 we need to set PC at 1000, to jump to
the branch target instruction(add
1008 DECREMENT R1 instruction). So -16 will be added with
1016 to get the value 1000. -16 is
1012 BRANCH >0 LOOP represented as offset in the label of the
1016 MOVE R0, SUM branch instruction. here it is LOOP.
i.e., LOOP is repreented as -16.
Autoincrement mode
In this mode, the operand is there in the memory. The effective address of the operand
is the contents of a register specified in the instruction. After accessing the operand,
the contents of the register is automatically incremented to point to the next item in a
list.

ADD (R2)+, R0

No of memory references to execute the above instruction is??

TWO
Autodecrement mode
In this mode, the operand is there in the memory. The effective address of the operand
is the contents of a register specified in the instruction. Before accessing the operand,
the contents of the register is automatically decremented to point to the operand in a
list.

ADD -(R2), R0

No of memory references to execute the above instruction is??

TWO
Problem
Program to add N numbers stored in memory location starting from
NUM1 LOCATION

MOVE N, R1
MOVE #NUM1, R2
CLEAR R0
LOOP ADD (R2)+, R0

DECREMENT R1
BRANCH >0 LOOP
MOVE R0, SUM
Addressing Modes

Name Assemble syntax Addressin functio


• The different r g n
ways in which Immediate #Value O erand = Value
the location of p
an operand is Register Ri E = Ri
specified in an A
Absolute (Direct) LOC E = LOC
instruction are A
referred to as Indirect (Ri ) E = [Ri ]
addressing (LOC) A = [LOC
E
modes. A ]
Index X(R i) E = [Ri ] + X
A
Basewith index (Ri ,Rj ) E = [Ri ] + [Rj ]
A
Basewith index X(R i,Rj ) E = [Ri ] + [Rj ] + X
and offse A
t
Relative X(PC) E = [PC + X
A ]
Autoincrement (Ri ) E = [Ri ] ;
+ AIncrement Ri
Autodecrement − (Ri ) Decrement R i ;
E = [Ri]
A
Assembler Directives
These are not executable statements, they are the directives/commands used by the
assembler, while translating a assembly language program to machine language
program.
Examples:

EQU:
SUM EQU 200

It informs the assembler that wherever SUM is used, should be replaced by the value
200.
DATAWORD:
NUM DATAWORD 200

END:
END START

RESERVE
NUM RESERVE 100
ORIGIN RETURN
ORIGIN 200
Numericals
What is A two-word instruction is stored in a location A. The operand part of
instruction holds B. If the addressing mode is relative, the operand is available
in which location?

A relative mode branch type instruction is stored in memory at an address 750. The
branch is made to an address 500. What should be the value of the relative address field of
the instruction?
Assembly Language Instruction
Label Operation Operand(s) comment

An instruction is stored at location 200 with it’s address field having the value 10. A
processor register R10 contains the value 210 which is also used as index register.
Evaluate the effective address of the operand if the addressing mode of the instruction is
(i)direct;(ii)register direct;(iii)register indirect;(iv)indexed.

How many memory references are required to execute the following


instructions?
i)ADD (R1) , R3 where R3 is the destination
ii)SUB #600 , R5 where R5 is the destination
Numericals
An instruction is a 24 bit instruction. It is a byte addressable memory. The PC
contains 300. Which one of the following is a legal PC value:

a. 400 (b) 500 (c) 600 (d) 700

Register R1 and R2 of a computer contains the decimal value100 and 200. What are the
effective address of memory operand in each of the following instruction?
i)LOAD 20 (R2), R1
ii)MOVE 300, R5
iii)ADD (R1), R2
iv)MUL (R1)+, R5
Numericals
a)A machine has a 32-bit architecture, with 1-word long instructions. It has 60
registers, each of which is 32 bits long. It needs to support 45 instructions,
which have an immediate operand in addition to two register operands.
Assuming that the immediate operand is an unsigned integer, what is the
maximum value of the immediate operand?
Numericals
A two-word instruction is stored in memory at an address designated by the symbol P.
The address field of the instruction (stored at P+1) is designated by the symbol Q. The
operand used during the execution of the instruction is stored at an address symbolized
by EA. An index register contains the value X. State how EA is calculated from the
other addresses if the addressing mode of the instruction is direct, indirect, relative, and
indexed.
Numericals
Write the number of memory references required for executing the following
instructions:
i)ADD R1,(R2)+
ii)SUB #10,R2
iii)MOV R1, 20(R3,R4)
iv)AND R1,R2
v)Increment A
Numericals
An instruction is kept in memory at an address 300 and the memory address 301 occupies
the address field of the instruction which is shown below. The Opcode is used to add the
content of accumulator with an operand. The content of accumulator is 100 and the
content register R5 is 400. Find out the content of accumulator and Effective address of
operand if the addressing mode is
(i) immediate (ii) direct (iii) register direct (iv) indirect (v) register indirect
Address Instruction
300 Opcode Mode

301 500

400 700

401 456

500 600

600 800
Numericals
A general purpose register organization computer has a 16 bit instruction
consisting of opcode, source register and a destination register. It supports 7 no of
arithmetic operations and 6 no of logical operations. Find the total number of
maximum registers present in the system.

Consider a processor with 64 registers that supports twelve instructions. Each instruction
has five distinct fields, namely, opcode, two source register fields, one destination register
field, and a twelve-bit immediate value. Each instruction must be stored in memory in a
byte-aligned fashion. If a program has 100 instructions, What is the amount of memory
(in bytes) consumed by the program?
Numericals
A two word instruction LOAD is stored at location 1000 with its address field at location
1001. The address field has the value 2000 and the value stored at 2000 is 5000 and at
5000 is 6500. The words stored at 2200, 3002 are 3500, 4000 respectively. An index
register has value 200. Evaluate the effective address and operand if addressing mode of
the instruction is as follows:
I. Memory Indirect Addressing Mode
II. Relative Addressing Mode
III. Index Addressing Mode
Numericals
Write the equivalent instructions for Zero Address Organization and One
Address Organization of the following instructions:
MOV P, R1
SUB Q, R1
DIV R, R1
MUL S, R1
MOV R1, X
Numericals
Both of the following statements cause the value 150 to be stored in location
2000
ORIGIN 2000
DATAWORD 150
And
Move #150,2000
Explain the difference.
Numericals
Match each of the high level language statements given on the left hand side
with the most natural addressing mode from those listed on the right hand side.

1. A[1] = B[J]; a) Indirect addressing


2. while [*A++]; b) Indexed, addressing
3. int temp = *x; c) Autoincrement

Match columns:
A B
Indirect Relocatable code
Index Passing array as a parameter
Base Register Array
Auto increment while (*A++)
Home Work
A program is requiredfor the task C[i]=A[i] x B[i] Write a program for this task
on a computer that supports one address instructions. Assume thatC,A[i]and
B[i]are located in main memory and the values is stored in main memory
location N.
Instruction Set
An extensive set of instructions are provided to carry out various computational task.

According to the operation carried out by the computer , the instructions are classified into
3 categories:

1. Data Transfer Instruction

2. Data Manipulation Instruction

a. Arithmetic Instruction
b. Logical Instruction

c. Shift Instruction

3. Program Control Instruction


Types of Instructions
• Data Transfer Instructions
Name Mnemonic
Data value is not
Load LD modified
Store ST
Move MOV
Exchange XCH
Input IN
Output OUT
Push PUSH
Pop POP
Types of Instructions
• Data Transfer Instructions
Name Mnemonic
Data value is not
Load LD modified
Store ST
Move MOV
Exchange XCH
Input IN
Output OUT
Push PUSH
Pop POP
Types of Instructions
• Data Transfer Instructions
Name Mnemonic
Data value is not
Load LD modified
Store ST
Move MOV
Exchange XCH
Input IN
Output OUT
Push PUSH
Pop POP
Data Manipulation Instructions
• Arithmetic Name Mnemonic
Increment INC
Decrement DEC
Add ADD
Subtract SUB
Multiply MUL
Divide DIV
Add with carry ADDC
Subtract with borrow SUBB
Negate NEG
Data Manipulation Instructions
• Logical & Bit Manipulation

Name Mnemonic
Clear CLR
Complement COM
AND AND
OR OR
Exclusive-OR XOR
Clear carry CLRC
Set carry SETC
Complement carry COMC
Enable interrupt EI
Disable interrupt DI
Data Manipulation Instructions
AND:
is used to reset some specific bit position in a register , keeping all
other bits intact( unchanged).
R1 0 1 0 1 1 0 0 1

Let say, we want to change the bit position 4th to 0 , without distrubing all other bits. Then
we need to AND all other bits with 1 and 4th bit with 0. As, X AND 1=X
and X AND 0=0

0 1 0 1 1 0 0 1

AND 1 1 1 1 0 1 1 1

0 1 0 1 0 0 0 1

AND #F7H, R1
Data Manipulation Instructions
OR:
is used to SET some specific bit position in a register , keeping all
other bits intact( unchanged).
R1 0 1 0 1 1 0 0 1

Let say, we want to change the bit position 3rd to 1 , without distrubing all other bits. Then
we need to OR all other bits with 0 and 4th bit with 1. As, X OR 1=1
and X OR 0=X

0 1 0 1 1 0 0 1

OR 0 0 0 0 0 1 0 0

0 1 0 1 1 1 0 1

OR #04, R1
Data Manipulation Instructions
XOR:
is used to CLEAR the contents of a register .

R1 0 1 0 1 1 0 0 1

As. X XOR X=0

0 1 0 1 1 0 0 1

XOR 0 1 0 1 1 0 0 1

0 1 0 1 1 0 0 1

XOR R1, R1
Problems
SETC:
CF ← 1

CLRC:

CF ← 0

COMC:

CF ← CF

EI

IF ← 1

DI:

IF ← 0
Data Manipulation Instructions
• Shift
Name Mnemonic
Logical shift right SHR
Logical shift left SHL
Arithmetic shift right SHRA
Arithmetic shift left SHLA
Rotate right ROR
Rotate left ROL
Rotate right through carry RORC
Rotate left through carry ROLC
Logical Left Shift Instruction
SHL: Logical left shift for unsigned numbers.
Provide a means for shifting blocks of bits within a register or memory.

C 0

The contents of the OPERAND are shifted left by the number of bits specified in the source
operand of the instruction. The vacated bits are filled with zeros. The shifted bits are passed
through the C flag, and then dropped. Left Shifting an operand is equivalent to multiplying
the operand by 2 (bit postions shifted)

R1 0 0 0 1 0 0 0 0 0

SHL #2, R1
0 0 1 0 0 0 0 0 0
R1
Logical Right Shift Instruction
SHR: Logical right shift for unsigned numbers.
Provide a means for shifting blocks of bits within a register or
memory.
0 REGISTER C

The contents of the OPERAND are shifted right by the number of bits specified in the source
operand of the instruction. The vacated bits are filled with zeros. The shifted bits are passed
through the C flag, and then dropped. Right Shifting an operand is equivalent to dividing the
operand by 2 (bit postions shifted)

R1 0 0 0 1 0 0 0 0 0

SHR #2, R1
0 0 0 0 0 1 0 0 0
R1
Arithmetic Left Shift Instruction
SHLA: Arithmetic left shift for signed numbers.
Provide a means for shifting blocks of bits within a register or
memory.
C 0

The contents of the OPERAND are shifted leftt by the number of bits specified in the source
operand of the instruction. The vacated bits are filled with zeros. The shifted bits are passed
through the C flag, and then dropped. Leftt Shifting an operand is equivalent to multiplying
the operand by 2 (bit postions shifted)

R1 0 0 0 1 0 0 0 0 0

SHL #2, R1
0 0 1 0 0 0 0 1 0 R1

SHLA affects the Overflow flag. V= Rn-2 XOR Rn-1 . i.e., the Overflow flag will be 1 if the
sign changes after the shift operation, else it will be 0.
Arithmetic Right Shift Instruction
SHRA: Arithmetic right shift for signed numbers.
Provide a means for shifting blocks of bits within a register or
memory.
0 REGISTER C

The contents of the OPERAND are shifted right by the number of bits specified in the source
operand of the instruction. The vacated bits are filled by the previous sign bit. The shifted bits
are passed through the C flag, and then dropped. Right Shifting an operand is equivalent to
dividing the operand by 2 (bit postions shifted)

R1 1 0 0 1 0 0 1 0 0

SHRA #1, R1
1 1 0 0 1 0 0 1 0
R1

Here, R1 contains -110 before the shift operation, and after the SHRA,
it contains -55
Arithmetic Right Shift Instruction
Example 1

R1 1 1 0 0 1 0 0 1 0

SHRA #1, R1

R1 1 1 1 0 0 1 0 0 1
Here, R1 contains -55 before the shift operation, and after the SHRA, it
contains -28
Example 2

R1 1 1 1 1 1 0 0 1

SHR #1, R1
R1 1 1 1 1 1 1 0 0 1
Here, R1 contains -7 before the shift operation, and after the SHRA, it
contains -4
Arithmetic Right Shift Instruction
Example 3

R1 0 0 0 0 1 1 1 1 0

SHRA #1, R1

R1 0 0 0 0 0 1 1 1 1

Here, R1 contains +15 before the shift operation, and after the SHRA, it
contains +7
Representing Signed No in 2’s Complement Method
For a +ve Number, to represent in the 2’s complement method, sign bit should be made
0, i.e., the MSb should be 0 and for the magnitude part, just write the binary of the
number.
Example: + 14 in 8 bits:
MSb will be 0, then the binary of 14 in 7 bits, i.e., 0001110
So, +14= 00001110
For a -ve Number, to represent in the 2’s complement method, sign bit should be made
1, i.e., the MSb should be 1 and for the magnitude part, just take the 2’s complement of
the binary of the number. TThe 2’s complement of a binary can be taken by copying
the bits of the binary from the LSb till the 1st 1 is found, then all the remaining bits are
flipped.

Example: - 14 in 8 bits:
MSb will be 1,
then the binary of 14 in 7 bits, i.e., 0001110
Next take the 2’s complement of 0001110
So, start copying from LSb, 01 then flip all the remaining bits,
hence the result will be 1110010
So, -14= 11110010
Rotate Right Instruction (ROR)
The bits of the destination are rotated right. The number of bits rotated is determined
by the source operand. The bits rotated out of the least significant bit of the operand go
to both the carry bit and the most significant bit of the operand.
Rotate Left Instruction (ROL)
The bits of the destination are rotated left. The number of bits rotated is determined
by the source operand. The bits rotated out of the most significant bit of the operand go to
both the carry bit and the least significant bit of the operand.
Rotate Right through Carry Instruction (ROR)
The bits of the destination are rotated right. The number of bits rotated is determined
by the source operand. The bits rotated out of the least significant bit of the operand go
to the carry bit and the previous carry bit goes to the most significant bit of the operand.
Rotate Left through carryInstruction
(ROL)
The bits of the destination are rotated left. The number of bits rotated is determined
by the source operand. The bits rotated out of the most significant bit of the operand go to
both the carry bit and the previous carry bit goes to the least significant bit of the operand.
Program Control Instructions
Name Mnemonic
Branch BR
Jump JMP
Skip SKP
Call CALL
Return RET
Compare (Subtract) CMP
Test (AND) TST
Program Control Instructions
Call: is used to call a subroutine.
1000: Call P1
1004: Next Instruction
1. Stack[top]← [PC] // Return address, i.e., 1004 is stored onto the stack.
and then
2. PC← ADDRESS OF THE SUBROUTINE // Here it is represented by P1

Return: is used to return from a subroutine.


PC← Stack[top]// Return address, i.e., 1004 is restored from the stack.
Program Control Instructions
Compare: is used to compare two numbers.
CMP src, dst
performs the operation: [dst] -[src]
Sets the condition code flags based on the result obtained.

Neither of the operands is changed.


Program Control Instructions
Test: is used to check a particular bit position value of an operand.
TEST #bit position, operand
performs non-destructive AND operation.
Sets the condition code flags based on the result obtained.

Neither of the operands is changed.


Register Transfer Notation
• Identify a location by a symbolic name standing for its hardware binary
address (LOC, R0,…)
• Contents of a location are denoted by placing square brackets around the
name of the location (R1←[LOC], R3 ←[R1]+[R2])
• Register Transfer Notation (RTN)
Condition Codes
• Condition code flags
• Condition code register / status register
• N (negative)
• Z (zero)
• V (overflow)
• C (carry)
• Different instructions affect different flags
Conditional Branch Instructions

• Example:
– A: 1 1 1 1 0 0 0 0 A: 11110000
– B: 0 0 0 1 0 1 0 0
+(−B): 1 1 1 0 1 1 0 0
11011100

C=1 Z=0
S=1
V=0
Basic Performance Equation

• T – processor time required to execute a program that has been prepared in


high-level language
• N – number of actual machine language instructions needed to complete the
execution (note: loop)
• S – average number of basic steps needed to execute one machine instruction. Each
step completes in one clock cycle
• R – clock rate
• Note: these are not independent to each other

How to improve T?
Thank You
Stack

Bindu Agarwalla

1
STACK

0
.

SP=1980 -12 Current top element


1984 23
1988 88
Stack
1992
1996
BOTTOM: 2000 56 Bottom element
.

2k -1
STACK [Assumption: The machine is byte addressable
and each element in the stack occupies 4Bytes]
Push Operation 0
• SP always points to the top element, so .
to push a newitem, first, it has to be
updated to the location, where a new
element can be pushed. SP=1976 19 Current top
• As stack grows downward in the 1980 -12 element
memory, so first SP is decremented,
then the NEWITEM is pushed. 1984 23

Note: This rule is applicable for pushing 1988 88 Stack


any (all) element onto the stack. 1992
Substract #4, SP 1996
Move NEWITEM, (SP) 2000 56 Bottom
↓ element
.
Move NEWITEM, -(SP)

2k -1
19 NEWITEM
STACK [Assumption: The machine is byte addressable
and each element in the stack occupies 4Bytes]
Push Operation 0
• If stack stack starts in memory from .
2000, then the initial value of SP
should be 2004, then only the 1st
element will be pushed at correct SP=1976 19 Current top
location. element
1980 -12
1984 23
1988 88 Stack
1992
1996
2000 56 Bottom
element
.

2k -1
19 NEWITEM
STACK [Assumption: The machine is byte addressable
and each element in the stack occupies 4Bytes]
Pop Operation 0
• SP always points to the top element, so .
to pop an element, first, it has to be
popped off, then the SP need to be
updated to the next top element.
• As stack grows downward in the SP=1980 -12 Current top
memory, so SP is incremented, after element
the top item is popped off. 1984 23
1988 88 Stack
Move (SP), ITEM 1992
ADD #4,SP 1996
↓ 2000 56 Bottom
Move (SP)+, ITEM element
.

2k -1
-12 ITEM
Limitation of Link Register
and use of stack for subroutine
Linkage

Bindu Agarwalla

6
STACK[Assumption: the stack is from 2000 to 1500]
SafePush Operation
In a full stack, push operation should not be
SAFEPUSH Compare #1500, SP done. So, if the value of SP is 1500 or less than
1500, that means already the stack is full. Hence,
Branch <= 0 FULLERROR compare operation gives either 0 or a negative
value after the operation. If SP is 1500, then the
Move NEWITEM, -(SP) next push operation will be done at 1496, which
is not the part of stack.
SafePop Operation

SAFEPOP Compare #2000, SP


From a empty stack, no element should be
Branch > 0 EMPTYERROR popped out. So, if the value of SP is greater than
2000, that means already the stack is empty.
Move (SP)+, ITEM Hence, compare operation gives a positive value
after the operation.
SUBROUTINES
The way in which a computer makes it possible to call and return from
subroutines is referred to as subroutine linkage method.
Linkage using Link Register
On Call

1. Store the contents of the PC in the link register.

2. Branch to the target address specified by the instruction

On returning from a subroutine


Branch to the address contained in the link register
Subroutine Linkage using link register

1000 CALL Sum 2000 First Instruction


1004 Next Instruction

Return
2000

PC 1004

Link 1004

Call Return

Limitation??
No support for Nesting of Functions.
Limitation of Link Register
and use of stack for subroutine
Linkage

Bindu Agarwalla

10
Limitation of Link Register
Nesting of functions/subroutines is not supported.
Solution
Using Stack: To support nesting of functions.

Using Stack: Calling a function and returning from it.

Assumptions:
Parameters are passed through general purpose registers.
Returning values through general purpose registers.
Example:

We are going to add N numbers stored in consecutive memory locations starting from
the symbolic address NUM1 using a function. the function is going to return the
summation result to the caller.
Limitation of Link Register
Nesting of functions/subroutines is not supported.
Stack as Subroutine Linkage Method
Calling program

1000 Move N, R1

1004 Move #Num1, R2


1008 Call LISTADD
1012 Move R0, SUM
.
. SP =1996 1012
.
SP =2000 14 14
Subroutine
LISTADD Clear R0
LOOP Add (R2)+, R0
Stack
Decrement R1
Branch > 0 LOOP
Return
Stack as Subroutine Linkage Method
Calling program

1000 Move N, R1

1004 Move #Num1, R2


1008 Call LISTADD
1012 Move R0, SUM
.
. SP =1996 1012
.
SP =2000 14 14
Subroutine
LISTADD Clear R0
LOOP Add (R2)+, R0
Stack
Decrement R1
Branch > 0 LOOP
Return
Stack as Subroutine Linkage Method [Parameter
Passing and returning value using stack
When a large numbers of parameters are required to pass to a function, we may not
have that many general purpose registers.

Solution:

Parameters are passed onto the stack, before calling the function.

Returning values through the stack.


Stack as Subroutine Linkage Method[Parameter Passing and
returning value using stack
Calling program

1000 Move #NUM1, -(SP)


1004 Move N, -(SP)
1008 Call LISTADD 1992 1012 (Ret Addr)

1012 Move 4(SP), SUM 1996 10 (Counter)


2000 3000 (Base Addr)
1016 ADD #8, SP
2004 28
14
Stack as Subroutine Linkage Method[Parameter
Passing and returning value using stack
Subroutine
LISTADD MoveMultiple R0-R2, -(SP)

MOVE 16(SP), R1
MOVE 20(SP), R2 SP=1980 [R2]
CLEAR R0 1984 [R1]

LOOP Add (R2)+, R0 1988 [R0]


1992 1012
Decrement R1
1996 1410
Branch > 0 LOOP
2000 3000
2004 28
Stack as Subroutine Linkage Method[Parameter
Passing and returning value using stack

LISTADD MoveMultiple R0-R2, -(SP)

MOVE 16(SP), R1
MOVE 20(SP), R2 1980 [R2]
CLEAR R0 1984 [R1]

LOOP Add (R2)+, R0 1988 [R0]


1992 1012
Decrement R1
1996 1410
Branch > 0 LOOP
2000 RESULT OF
Move R0, 20(SP)
SUMMATION
MoveMultiple (SP)+, R0-R2
2004 28
Stack as Subroutine Linkage Method[Parameter
Passing and returning value using stack

LISTADD MoveMultiple R0-R2, -(SP)

MOVE 16(SP), R1
MOVE 20(SP), R2
CLEAR R0
LOOP Add (R2)+, R0
1992 1012
Decrement R1
1996 14 10
Branch > 0 LOOP
Move R0, 20(SP) 2000 RESULT OF
SUMMATION
MoveMultiple (SP)+, R0-R2
RETURN 2004 28
Stack as Subroutine Linkage Method[Parameter
Passing and returning value using stack

LISTADD MoveMultiple R0-R2, -(SP)

MOVE 16(SP), R1
MOVE 20(SP), R2
CLEAR R0
LOOP Add (R2)+, R0
Decrement R1
1996 10
14
Branch > 0 LOOP
2000 Result of summation
MoveMultiple (SP)+, R0-R2
RETURN 2004 28
Stack as Subroutine Linkage Method[Parameter
Passing and returning value using stack]
Calling program

1000 Move #NUM1, -(SP)


1004 Move N, -(SP)
1008 Call LISTADD
1012 Move 4(SP), SUM 1996 10
2000 Result of Summation
1016 ADD #8, SP
2004 28
14
Numerical
The content of the top of memory stack is 2452.The content of SP is
1258. A two byte call subroutine instruction is located in memory
address 1456 followed by address field of 5490 at location 1457.What
are the content of PC , SP and top of stack;
1.Before call instruction execution
2.After call instruction execution
3.After return from subroutine
Numerical
How many times a subroutine should be called so that the stack becomes full, Assume
that the stack address space ranges from 2000 to 1600 and each stack word consumes 4
bytes and machine is byte addressable.[Note: No parameter, return value, registers,
local variables are stored in the stack due to subroutine call]
Numerical
Given the following program fragment
Main Program First Subroutine SUB1 Second Subroutine SUB2
2000 ADD R1, R2 3000 MOV R1,R2 4000 SUB R6, R1
2004 XOR R3, R4 3004 ADD R5, R1 4008 XOR R1, R5
2008 CALL SUB1 3008 CALL SUB2 4012 RETURN
2012 SUB R4, R5 3012 RETURN

Initially the stack pointer SP contains 5000.


What are the content of PC, SP, and the top of the stack?
i) After the subroutine call instruction is executed in the main program?
ii) After the subroutine call instruction is executed in the subroutine SUB1?
iii) After the return from SUB2 subroutine?
Numerical
Numerical
The content of the top of the memory stack is 5000. The content of the stack
pointer SP is 3000. Assume you want to organize a nested subroutine calls on a
computer as follows:
The routine Main calls a subroutine SUB1 by executing a two-word call subroutine
instruction located in memory at address 1000 followed by the address field of 6000 at
location 1001.Again subroutine SUB1 calls another subroutine SUB2 by executing a
two-word call subroutine instruction located in memory at address 6050 followed by the
address field of 8000 at location 6051 . What are the content of PC, SP, and the top of
the stack?
i) After the subroutine call instruction is executed in the main routine?
ii) After the subroutine call instruction is executed in the subroutine SUB1?
iii) After the return from SUB2 subroutine?
Numerical
Given the following program fragment
Main Program First Subroutine SUB1 Second Subroutine SUB2
6000 8000 1st Inst
1000 CALL SUB1(6000) 6050 CALL SUB2(8000)
1002 Next Inst 6052 Next Inst

6080 Return 8080 Return


Initially the stack pointer SP contains 5000.
Basic Performance Equation

29
Numerical 2
Suppose a program (or a program task) takes 1 billion instructions to execute on a
processor running at 2 GHz. Suppose also that 50% of the instructions execute in 3 clock
cycles, 30% execute in 4 clock cycles, and 20% execute in 5 clock cycles. What is the
execution time for the program or task?

Solution
Total Number instructions=N=109

Value Frequency Product


3 0.5 1.5
4 0.3 1.2
5 0.2 1.0

S=CPI=3.7

R = 2GHz
So, T= (S x N)/R
= (3.7 x 109)/(2 x 109) sec= 1.85 sec
Basic Performance Equation
The basic performance equation, which is fundamental to measuring
computer performance, measures the CPU time, is as follows.
CPU Time To execute a program = T= time/program

Time required to execute a basic step = P = time/cycle


Average number of basic steps required to execute one machine instruction =
CPI = S = cycles/instruction

Number of instructions a program contains is N = instructions/program

Time/Program= Time/Cycle x Cycles/Instruction x Instructions/Program

CPU Time = P x CPI x N


T= P x S x N
T= (S x N)/R

Where R(Clock Rate)


Numerical
If a 8GHz computer takes 7 clock cycles for ALU instructions, 11 clock cycles for
branch instructions and 6 clock cycles for data transfer instructions. Then Find the total
time taken by the computer to execute the program that consists of 10 ALU instructions,
5 branch instructions and 5 data transfer instructions.

Solution
Total Number instructions=N=10+5+5=20

Total cycles=7*10+11*5+6*5=70+55+30=155

S=CPI=155/20
R = 8GHz

So, T= (S x N)/R
= (155/20) x 20)/(8 x 109) sec= 19.375 x10-9 sec = 19.375 nsec.
Numerical 2
A program is running on an 8MHz processor. Assume that 50% instructions perform an
ALU operation, 30% instructions perform memory operation and 20% instructions
perform Branching. Further assume that instructions performing an ALU operation take 4
clock cycles, instructions performing memory operation take 9 clock cycles and
instructions performing Branching take 7clock cycles. What is the total time taken by the
program?
Solution
Total Number instructions=N=100

Value Frequency Product


4 0.5 2.0
9 0.3 2.7
7 0.2 1.4

S=CPI=6.1

R = 8MHz
So, T= (S x N)/R
= (6.1 x 102)/(8 x 106) sec= 76.25 μ sec
Numerical 2
A program is running on an 8MHz processor. Assume that 50% instructions perform an
ALU operation, 30% instructions perform memory operation and 20% instructions
perform Branching. Further assume that instructions performing an ALU operation take 4
clock cycles, instructions performing memory operation take 9 clock cycles and
instructions performing Branching take 7clock cycles. What is the total time taken by the
program?
Solution
Total Number instructions=N=100

Value Frequency Product


4 0.5 2.0
9 0.3 2.7
7 0.2 1.4

S=CPI=6.1

R = 8MHz
So, T= (S x N)/R
= (6.1 x 102)/(8 x 106) sec= 76.25 μ sec
Big Endian and
Little Endian Representation

35
Big Endian and Little Endian Concept
Little and big endian are two ways of storing multibyte data-types ( int, float,
etc). For Single Byte no ordering is required.
In byte ordering, the "big end" byte is called the "high-order byte" or the
"most significant byte".
The term ‘endian’ as derived from ‘end’ may lead to confusion. The end
denotes which end of the number comes first rather than which part comes at
the end of the sequence of bytes.
The basic endian layout can be seen in the table below:

Endianness First Byte Last Byte


Big Most Significant Least Significant
Little Least Significant Most Significant
Big Endian and Little Endian Concept
Big Endian Byte Order Little Endian Byte Order
The most significant byte (the "big end") The least significant byte (the "little end")
of the data is placed at the byte with the of the data is placed at the byte with the
lowest address. The rest of the data is lowest address. The rest of the data is
placed in order in the next bytes in placed in order in the next bytes in
memory. memory.
Example: Motorola Computer/Machine Example: Intel 80x86 Computer/Machine
Big Endian and Little Endian Concept
Example: Represent the integer 0x01234567 (represented in hexa) in Big
Endian and Littel Endian machines.
Answer:

0x100 0x101 0x102 0x103


01 23 45 67
BIG ENDIAN

0x100 0x101 0x102 0x103


67 45 23 01
LITTLE ENDIAN
Big Endian and Little Endian Concept
Example: Represent the integer 0x01234567 (represented in hexa) in Big
Endian and Littel Endian machines.
Answer:

0x100 0x101 0x102 0x103


01 23 45 67
BIG ENDIAN

0x100 0x101 0x102 0x103


67 45 23 01
LITTLE ENDIAN
Examples On
Big Endian and Little Endian Representation

40
Big Endian and Little Endian Concept
Example: Represent the integer 14342 in Big Endian and Little Endian
machines.

Answer: 1. Convert the no into binary : (32bits)


14342= (0000 0000 0000 0000 0011 1000 0000 0110)2

Answer: 2. Write the hexadecimal equivalent no for the binary combination


obtained in the step no1.
14342= (0000 0000 0000 0000 0011 1000 0000 0110)2 = 00 00 38 06H

0x100 0x101 0x102 0x103


06 38 00 00
LITTLE ENDIAN

0x100 0x101 0x102 0x103


00 00 38 06
BIG ENDIAN
Big Endian and Little Endian Concept
Example: Represent the integer -14342 in Big Endian and Little Endian
machines.
Answer: 1. Convert the no into binary : (32bits)
14342= (0000 0000 0000 0000 0011 1000 0000 0110)2
Answer: 1. Then take the 2’s complement or the binary combination obtained in the
step no1.
-14342= (1111 1111 1111 1111 1100 0111 1111 1010)2
Answer: 2. Write the hexadecimal equivalent no for the binary combination obtained in
the step no 2.
-14342= (1111 1111 1111 1111 1100 0111 1111 1010)2 = FF FF C7 FAH
0x100 0x101 0x102 0x103
FA C7 FF FF
LITTLE ENDIAN

0x100 0x101 0x102 0x103


FF FF C7 FA
BIG ENDIAN
Big Endian and Little Endian Concept
Example: Represent the long integer a=14342 in Big Endian and Little Endian
machines.

Answer: 1. Convert the no into binary : (64bits)


14342=
(0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1000 0000 0110)2
Answer: 2. Write the hexadecimal equivalent no for the binary combination
obtained in the step no1.
14342= 00 00 00 00 00 00 38 06H

0x100 0x101 0x102 0x103 0x104 0x105 0x106 0x107


06 38 00 00 00 00 00 00
LITTLE ENDIAN
0x100 0x101 0x10 0x103 0x104 0x105 0x106 0x107

00 00 00 00 00 00 38 06
BIG ENDIAN
GATE Question On
Big Endian and Little Endian Representation

44
GATE 2021
If the numerical value of a 2-byte unsigned integer on a little endian computer
is 255 more than that on a big endian computer, which of the following
choices represent(s) the unsigned integer on a little endian computer?
a.0x6665
0x6566
b. 0x0001 0x00FF
c. 0x4243 0x6665
d. 0x0100
So, option a is correct
Solution: Let’s take the option a.
Little Endian= 0x6665
So, the original number= 0x6566
So, the big endian= 0x6566 , little endian= 0x6665
Given , Little endian=255 + Big endian
255= 1111 1111 =0xFF
GATE 2021
If the numerical value of a 2-byte unsigned integer on a little endian computer
is 255 more than that on a big endian computer, which of the following
choices represent(s) the unsigned integer on a little endian computer?
a.0x6665
0x0100
b. 0x0001 0x00FF
c. 0x4243 0x01FF
d. 0x0100

Solution: Let’s take the option b. So, option B is incorrect


Little Endian= 0x0001
So, the original number= 0x0100
So, the big endian= 0x0100 , little endian= 0x0001
Given , Little endian=255 + Big endian
255= 1111 1111 =0xFF
GATE 2021
If the numerical value of a 2-byte unsigned integer on a little endian computer
is 255 more than that on a big endian computer, which of the following
choices represent(s) the unsigned integer on a little endian computer?
a.0x6665
0x4342
b. 0x0001 0x00FF
c. 0x4243 0x4441
d. 0x0100

Solution: Let’s take the option c. So, option C is incorrect


Little Endian= 0x4243
So, the original number= 0x4342
So, the big endian= 0x4342 , little endian= 0x4243
Given , Little endian=255 + Big endian
255= 1111 1111 =0xFF
GATE 2021
If the numerical value of a 2-byte unsigned integer on a little endian computer
is 255 more than that on a big endian computer, which of the following
choices represent(s) the unsigned integer on a little endian computer?
a.0x6665
0x0001
b. 0x0001 0x00FF
c. 0x4243 0x0100
d. 0x0100

Solution: Let’s take the option c. So, option D is correct


Little Endian= 0x0100
So, the original number= 0x0001
So, the big endian= 0x0001 , little endian= 0x0100
Given , Little endian=255 + Big endian
So, option Aand D both
255= 1111 1111 =0xFF
are correct.
GATE QUESTIONS ON INSTRUCTION
FORMATS
Bindu Agarwalla

49
GATE 2018
A processor has 16 integer registers (R0, R1, … , R15) and 64 floating point
registers (F0, F1, … , F63). It uses a 2-byte instruction format. There are four
categories of instructions: Type-1, Type-2, Type-3, and Type 4. Type-1
category consists of four instructions, each with 3 integer register operands
(3Rs). Type-2 category consists of eight instructions, each with 2 floating
point register operands (2Fs). Type-3 category consists of fourteen
instructions, each with one integer register operand and one floating point
register operand (1R+1F). Type-4 category consists of N instructions, each
with a floating point register operand (1F).
(A) 32
(B) 64
(C) 256
(D) 512
GATE 2018
# of integer registers= 16, i.e., 4(24) bits are required to represent a integer
register.
# of fp registers= 64, i.e., 6(26) bits are required to represent a fp register.

4 4 4 4
int reg int reg
Type-1

4 6 6
fp reg fp reg

Type-2

6 4 6 Type-3
int reg fp reg

10 6 Type - 4
fp reg
GATE 2018
No of Type-1 instructions is : 4 x 212

No of Type- 2 instructions is : 8 x 212

No of Type- 3 instructions is : 14 x 210

Total no of instructions is 216

No of Type- 4 instructions is : 216 - [ 22 x 212+ 23 x 212+14 x 210 ]


216 - [ 211 x ( 23 + 24+7) ]
= 216 - [ 211 x 31)]
=211(25-31)=211

Given, N x 26 = 211
N= 211/26 =25
N=32
GATE 2020
A processor has 64 registers and uses 16-bit instruction format. It has two types
of instructions: I-type and R-type. Each I-type instruction contains an opcode, a
register name, and a 4-bit immediate value. Each R-type instruction contains an
opcode and two register names. If there are 8 distinct I-type opcodes, then the
maximum number of distinct R-type opcodes is _____.
# of registers= 64, i.e., 6(26) bits are required to represent a register.

6 6 4 I-Type
Reg Imm

4 6 6 R - Type
Reg Reg
Let, N no of R-Type opcodes are possible
N x 212= 216- 8 x 26 x 24
N x 212= 216- 23 x 26 x 24
N = 24- 21 =14
Numerical 3
The content of register R1is 10110011. What will be the decimal value after execution
of following instruction. [Assume the number is represented in 2's complement format]
AShiftL #2, R1

0 1 0 1 1 0 0 1 1 R1

AShiftL#2, R1

1 0 1 1 0 0 1 1 0 R1

0 1 1 0 0 1 1 0 0 R1

Here, R1 contains -77 before the shift operation, and after the AshiftL 2 times,
it contains -11
Numerical 4
Execute the following instruction where Ro is of 8 bits and its content is
11001011.
i)Lshift L #2, Ro ii) Ashift R #1, Ro

0 1 1 0 0 1 0 1 1 R0

LshiftL#2, R0

1 1 0 0 1 0 1 1 0 R0

1 0 0 1 0 1 1 0 0 R0

Here, R0 contains -43 before the shift operation, and after the AshiftR 2 times,
it contains -11
Thank You
Basic Processing Unit
Bindu Agarwalla

1
Fundamental Concepts

Processor fetches one instruction at a time and perform the operation specified.

Instructions are fetched from successive memory locations until a branch or a


jump instruction is encountered.

Processor keeps track of the address of the memory location containing the next
instruction to be fetched using Program Counter (PC).

Instruction Register (IR)


Executing an Instruction
Fetch the contents of the memory location pointed to by the PC. The contents
of this location are loaded into the IR (fetch phase).
IR ← [[PC]]

Assuming that the memory is byte addressable, increment the contents of the
PC by 4 (fetch phase).
PC ← [PC] + 4

Carry out the actions specified by the instruction in the IR (execution phase).
Single Bus CPU Organization
Basic Operations involved in the execution
of an instruction
Transfer a word of data from one processor register to another or to the ALU.

Perform an arithmetic or a logic operation and store the result in a processor register.

Fetch the contents of a given memory location and load them into a processor register.

Store a word of data from a processor register into a given memory location.
Register Transfer Operation
MOV R1, R2

1. R2in
R1out,
Fetching a Word from Memory
MOV (R1),R2

Step 1: Transfer the address into MAR;


Step 2: Issue Read operation;
Step 3: Wait for data from memory
Step 4: Data from MDR will be transferred
to the destination.
1. R1out, MAR , Read
in

2. WMFC
MDRinE,
3. MDRout, R2in
Storing a Word into Memory
MOV R2, (R1)
Address into MAR;
Data into MDR

Issue Write operation;

Wait for Write operation to complete

1. R1out, MARin

2. MDRin ,Write
R2out,
3. WMFC
MDRoutE,
Performing an Arithmetic or
Logical Operation
ADD R1, R2, R3
Put one of the operands into Y register
Put the other operand on the bus and
perform the operation
Send the result into the destination

1. R1out, Yin

2. R2out, SelectY, Add, Zin

3. Zout, R3in
Execution of a Complete Instruction
Add (R3), R1

Fetch the instruction


Fetch the first operand (the contents of
the memory location pointed to by R3)
Perform the addition

Store the result into R1


Execution of a Complete Instruction
Add (R3), R1

Fetch the instruction


Fetch the first operand (the contents of
the memory location pointed to by R3)
Perform the addition

Store the result into R1


Execution of a Complete Instruction
Add (R3), R1

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. MARin, Read
R3out,
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. R1in, end
Zout,
Execution of a Complete
Instruction:Register Indirect Mode
Add R1, (R3)

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. MARin, Read
R3out,
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. MARin,
R3out,
8. MDRin, Write
Zout,
9. WMFC
10. End
Execution of a Complete Instruction
: Immediate mode
MOV #300, R1

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. R1in, End
Address_Field_of_IRout,
Control Sequence for Execution of a
Complete Instruction
for Register Indirect Mode and
Immediate mode operand

Bindu Agarwalla

15
Execution of a Complete Instruction
: Direct mode
MOV 300, R1

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. MARin, Read
Address_Field_of_IRout,
5. WMFC
6. MDRout, R1in, end
Execution of a Complete Instruction
: Indirect mode
MOV (300), R1

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. MARin, Read
Address_Field_of_IRout,
5. WMFC
6. MDRout, MARin, Read
7. WMFC
8. MDRout, R1in, end
Execution of a Complete Instruction
: Index mode
Add 30(R1), R2

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. Yin
R1out, Add Zin
5. Address_Field_of_IRout, Select
MARin, Read Y, ,
6.
Zout,
7. R2out, Yin, WMFC
8. MDRout, SelectY, Add, Zin
9. Zout, R2in, end
Execution of a Complete Instruction
: relative mode
JMP L1

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. SelectY Zin,
Address_Field_of_IRout, , Add,
5. Zout, PCin, end
Execution of a Complete Instruction
: Relative mode
BR<0 L1

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. SelectY Zin,
Address_Field_of_IRout, , Add,
N==0, then end
5. Zout, PCin, end
Execution of a Complete Instruction
: Autodecrement mode
MUL -(R1), R2

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. Select4 Sub, Zin
,
R1out, R1in, MARin Read
5. Zout,
,
Yin WMFC
6.
R2out, ,
7. MDRout, SelectY, MUL Zin
8. Zout, R2in, end ,
Execution of a Complete Instruction
: Autoincrement mode
MUL (R1)+, R2

1. PCout, MARin, Read, Select4, Add Zin


,
2. PCin Yin, WMFC
Zout, ,
3. MDRout, IRin
4. MARin, Read, Select4 Add, Zin
R1out, R1in ,
5. Zout,
6. Yin WMFC
R2out, , Zin
7. MDRout, SelectY,
8. Zout, R2in, end Mul,
MultiBus CPU Organization

Fig. Three Bus CPU Organization


MultiBus CPU Organization
Add R4, R5, R6

1. PCout, R=B, MARin, Read, IncPC

2. WMFC

3. MDRoutB, R=B, IRin


4. R4outA, R5outB, SelectA, ADD, R6in, end
MultiBus CPU Organization
Add (R4), R5

1. PCout, R=B, MARin, Read, IncPC

2. WMFC

3. MDRoutB, R=B, IRin


4. R4outB, R=B, MARin, Read

5. WMFC
6. MDRoutB, R5outA, SelectA, ADD, R5in, end
Hardwired Control Unit
To execute instructions, the processor must have some means of
generating
the control signals needed in the proper sequence.
Two categories:
Hardwired control Unit and
Microprogrammed control Unit

Hardwired system can operate at high speed; but with little


flexibility.
Hardwired Control Unit
To execute instructions, the processor must have some means of
generating
the control signals needed in the proper sequence.
Two categories:
Hardwired control Unit and
Microprogrammed control Unit

Hardwired system can operate at high speed; but with little


flexibility.
This type of CU is designed using a nmber of combinational and sequential circuits
like gates, FFs, decoder, encoder and other digital circuis.

For CU to perform it’s function, it has some inputs that allows it to determine the state of
the system and outputs that allows it to control the behaviour of the system.

These inputs are the external specification of the CU.

Internally, the CU must have some logic to perform its sequencingand execution
function.
Hardwired Control Unit
Inputs to CU:
1. Clock(Contents of control step counter)
2. Instruction Register (Contents of IR)
3. Flags(Contents of condition codes)

4.Control Signals from External Bus


(e.g., MFC)
Hardwired Control Unit
Steps used in generating a control signal: (e.g., Zin)
1. Find out the control sequence for the instructions supported by the ISA.

2. Next find out , in which instructions the signal is appearing and then find
out the step number of that instruction the signal is appearing.

3. Say, Zin is appearing in the 6th step of the instructionAdd (R3), R1. It means Zin
signal need to be generated for the step no 6 of ADD instruction, so when both the cases
are true, Zin need to be generated, like that for JMP L1 instruction , Zin is generated in
step no 4. i.e., in either of the two instructions Zin signal need to be generated.

4. So, the logic function, for Zin will be OR of the above two AND cases.
Zin= ADD.T6+ JMP.T4+........................................[+.........indicates other possible
cases]

5. Again, we hve seen that Zin is required for all the instructions in the step no 1 during
the fetch phase of any instruction, i.e., irrespective of any instruction, in the step no 1
Zin required. So,
Zin= T1+ ADD.T6+ JMP.T4+.......................................
Generating the Zin Signal
Add (R3), R1
1. PC out, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, end
JMP L1
1. PC out, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. Address_field_of_IRout, SelectY, Add, Zin

5. Zout, PCin, end


Generating the Zin Signal
BR <0 L1
1. PC out, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. Address_field_of_IRout, SelectY, Add, Zin, if N==0,then end
5. Zout, PCin, end

Step 2: We have seen that Zin is active for all the instructions in control sequence no 1.
i.e., it is not dependent on the instruction.
Next, Zin is active in the step no 6 for the ADD instruction,
Zin is active in the step no 4 for the JMP instruction,
Zin is active in the step no 4 for the BR instruction........

Step 3:
Zin= T1 + ADD.T6 + JMP.T4 + BRN.T4.N
Logic function for Zin signal
Logic function for End signal

End= ADD.T7 + BRN.T5.N + BRN.T4.N + BR.T5+.......................


A hardwired CPU uses 10 control signals S1 to S10, in various time
steps T1 to T5, to implement 4 instructions I1to I4 as shown below:
T1 T2 T3 T4 T5
I1 S1, S3, S2, S4, S1, S7 S10 S3,S8
S5 S6
I2 S1, S3, S8,S9, S5, S6, S6 S10
S5 S10 S7

I3 S1, S3, S2, S6, S2, S6, S10 S1, S3


S5 S7 S9
I4 S1, S3, S2, S6, S5, S10 S6, S9 S10
S5 S7

Write the expressions to represent the circuit for generating control signals S5
and S10respectively? What will be the specification of step decoder and
instruction decoder in the hardwired control unit?
A computer has 58 instructions; each instruction requires at most 15
steps to complete its execution. What will be the specification of
instruction and step counter decoder used in hardware control unit
design?
A hardwired CPU has only 3 instructions I1, I2 and I3, which use the
following signals in time steps T1-T5
T1 T2 T3 T4 T5
I1 Ain,Bout PCout,Bin Zout,Ain Bin,Cout End
,Cin
I2 Cin,Bout, Aout,Bin Zout,Ain Bin,Cout End
Din
I3 Din,Aout Ain,Bout Zout,Ain Dout,Ain End

Write the logic function for generating the signal Ain?


Consider an example of memory organization as shown in the figure
below. Which valuewill be loaded into the accumulator when the
instruction “LOAD DIRECT 3” is executed

ML 0 1 2 3 4 5 6 7
Address
Content 10 23 25 20 12 3 1 2

3
25
12
20
Consider an example of memory organization as shown in the figure
below. Which valuewill be loaded into the accumulator when the
instruction “LOAD INDIRECT 7” is executed

ML 0 1 2 3 4 5 6 7
Address
Content 10 23 25 20 12 3 1 2

2
25
7
20
Consider a three word machine instruction-

ADD A[R0], @B

The first operand (destination) “A[R0]” uses indexed addressing mode


with R0 as the index register. The second operand operand (source)
“@B” uses indirect addressing mode. A and B are memory addresses
residing at the second and the third words, respectively. The first word of
the instruction specifies the opcode, the index register designation and
the source and destination addressing modes. During execution of ADD
instruction, the two operands are added and stored in the destination
(first operand).

The number of memory cycles needed during the execution cycle of the
instruction is??
Microprogrammed Control Unit
An alternative to a hardwired control unit is a microprogrammed control unit, in which the
logic of the control unit is specified by a microprogram.

A microprogram consists of a sequence of instructions in a microprogramming language.


These are very simple instructions that specify micro-operations.

The term microprogram was first coined by M. V. Wilkes in the early 1950s
Introduction to Microprogrammed CU
Add (R3), R1
1. PC out, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin 7. Zout, R1in, end

μ P M M R S A Z Z P Y W I R R R R R S X M D R R R W . . . . E
i C A D e e d i o C i M R 3 1 1 2 2 u O U i i 4 4 r n
n o R R a l d n u i n F i o o i i o b R L v n i o i d
s u i o d e t n C n u u n n u n u t
t t n u c t t t t e
r t t
u
c
t
i
o
n
1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

4 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Terms Related to Microprogrammed Control Unit

Microinstruction (Control Word): For each microoperation the CU generates a set of


control signals. Thus for any microoperation, each control line emanating from CU is either
on or off . This condition can be represented by a binary digit for each control line. Each
row of the previous table represents a microinstruction. i.e., in a microinstruction, each bit
position is fixed for a particular control signal, if a signal is generated in that particular
micro-instruction, then that bit postion value will be 1 else it will be 0.

Microroutine: A sequence of control words corresponding to a particular instruction is


called as microroutine for that instruction. The previous table represents the microroutine
for the instruction Add (R3), R1

Control Store: The instruction set of any computer is finite. The microroutines for
all the instructions in the instruction set is stored in a special memory called as
Control store/ Control Memory.

μPC: μPC points to the next microinstruction that need to be fetched from Control
store.
Organization of Control Memory
The fig. shows how the control words or
microinstructions could be arranged in a control
memory.

The microinstructions in each routine are


to be executed sequentially.
Each routine ends with a branch or jump
instruction indicating where to go next.

There is a special execute cycle routine whose only


purpose is to signify that one of the machine
instruction routines (AND, ADD, and so on) is to
be executed next, depending on the current opcode.
Typical Microinstruction Format

There is one bit for each internal processor control line and one bit for each system bus control line.

There is a condition field indicating the condition under which there should be a branch, and there
is a field with the address of the microinstruction to be executed next when a branch is taken.
Interpretation
To execute this microinstruction, turn on all the control lines indicated by a 1 bit; leave off all control
lines indicated by a 0 bit. The resulting control signals will cause one or more micro-operations to be
performed.

If the condition indicated by the condition bits is false, execute the next microinstruction in sequence.

If the condition indicated by the condition bits is true, the next microinstruction to be executed is
indicated in the address field.
Microprogrammed CU

To execute any instruction, the CU should first find the starting address of the
corresponding microroutine and then can generate the control signals in sequence
by reading the control words one by one.
Microprogrammed Control Unit
In this control unit, the μPC is incremented every time a new microinstruction
is fetched from the microprogram memory, except in the following situations:

1. When a new instruction is loaded into the IR, the μPC is loaded with the starting
address of the microroutine for that instruction.

2. When a Branch microinstruction is encountered and the branch condition is


satisfied, the μPC is loaded with the branch address.

3. When an End microinstruction is encountered the μPC is loaded with the address
of the first CW in the microroutine for the instruction fetch cycle.
Microprogrammed CU
Thank You
2’s Complement Numbers
And
Condition Codes
Bindu Agarwalla
Register Transfer Notation
Identify a location by a symbolic name standing for its hardware binary
address (LOC, R0,…)
Contents of a location are denoted by placing square brackets around the
name of the location.
R1←[LOC],
R3 ←[R1]+[R2

This type of notation is known as Register Transfer Notation(RTN).


In an RTN expression, the right hand side always denotes a value. And
left-hand side represents the name of a location where the value is to be placed,
overwriting the previous contents of that location.
Representing Signed No in 2’s Complement Method
For a -ve Number, to represent in the 2’s complement method using n-bits, first
represent the number in binary using n-bits and then just take the 2’s complement of
the binary of the number. Another way is simply substract the number from 2n.

The 2’s complement of a binary can be taken by copying the bits of the binary from
the LSb till the 1st 1 is found, then all the remaining bits are flipped.

Example: - 14 in 8 bits:
The binary of 14 in 8 bits, i.e., 00001110
Next take the 2’s complement of 00001110
So, start copying from LSb into the resultant, till the first 1 (inclusive) then flip all
the remaining bits,
hence the result will be 11110010
So, -14= 11110010
In the other method: -14= 28-14=256-14=242=11110010 (128+64+32+16+2)
Representing Signed No in 2’s Complement Method
For a +ve Number, to represent in the 2’s complement method, just write the binary of
the number.

Example: + 14 in 8 bits:
The binary of 14 in 8 bits, i.e., 00001110

So, +14= 00001110


b3b2b1b0 2’s complement
Representing Signed 0111 +7
No in 2’s Complement 0110 +6
Method 0101 +5
0100 +4
0011 +3
The range of values represented
using n bits is: 0010 +2
-2n-1 to + 2n-1 -1 0001 +1

The range of values represented using 0000 +0


4 bits is: 1000 -8
-24-1 to + 24-1 -1(-23 to 23-1:-8 to +7)
1001 -7
1010 -6
And in the 2’s complement form there
is only 1 representation of 0. 1011 -5
1100 -4
1101 -3
1110 -2
1111 -1
Addition and Subtraction of n-bit signed
numbers using 2’s complement representation
Rule 1: To add two numbers, simply add their n-bit representations, discarding
the carry-out signal from the MSb (Most significant bit position).
If sum is in the range : -2n-1 to + 2n-1 -1
for n-bit numbers, then the sum will the algebraically correct value in the 2’s
complement representation.

To perform subtract operation, X - Y , form the 2’s complement of Y [ as a -ve


of a number is obtained by taking the 2’s complement of that number] and then
add Y to X, as in Rule 1.
If the result is in the range : -2n-1 to + 2n-1 -1
for n-bit numbers, then the result will the algebraically correct value in the 2’s
complement representation.
Condition Code Register / Status Register
It is a special purpose CPU register. This register is used to keep track of
information about the results of various operations.

These information is used by the following(subsequent) conditional branch


instruction to decide whether to take a branch or not.

Like after an arithmetic operation is performed, whether the result is zero/


negative/ carry is generated/ overflow occured, these various conditions are
trapped inside the flag register. And looking at the content of flag register,
branch has to be taken or not can be decided.

Basically, a flag register contains a collection of bits, where each bit position is
an indication of a particular condition that may occur due to an instruction is
executed.
Condition Codes
Four commonly used flags are:

N (negative): Set to 1if the result is negative, else cleared to 0.

Z (zero): Set to 1if the result is 0, else cleared to 0.

V (OVERFLOW): Set to 1if arithmetic overflow occurs, else cleared to 0.

C (Carry): Set to 1if carry-out results from the operation, else cleared to 0.

Note: Overflow occurs when the result of an arithmetic operation is outside the
range of values that can be represented by the number.
Overflow Condition
Overflow can occur only when adding two numbers of same sign.

The carry out signal from the sign bit position is not a sufficient indicator of
overflow when adding signed numbers.

Overflow condition can be detected by examining the signs of the two


summands X and Y and the sign of the result.

When both operands X and Y have the same sign, an overflow occurs when the
sign of S is not matching with the signs of X and Y.
Overflow Condition [Example 1]
For example, add +7 and +4
So, the addition of +7 and +4 is generating
+7=0111 overflow condition, as the result is outside the
+4=0100 range of values that can be represented using 4
bits.
0111
+ 0100
1011
How to interpret the result?
As the numbers are represented in 2’s complement form, and the sign bit is 1,
hence the result is a negative quantity. To get the magnitude, take the 2’s
complement of the result (1011).
2’s complement of 1011 is 0101, (0101)2=(5)10
So, the result is -5
i.e., (+7) + (+4)=-5, an incorrect result
As, (+7) + (+4)=+11, and to represent +11 we need 5 bits, 1 for the sign, 4bits for the
magnitude
Overflow Condition [Example 2]
For example, add -4 and -6
So, the addition of -4 and -6 is generating overflow
-4=1100 condition, as the result is outside the range of
-6=1010 values that can be represented using 4 bits.
However, this time carry is generated out of MSb
1100 addition.
+ 1010
10110 from the result, carry out will be discarded.
How to interpret the result?
As the numbers are represented in 2’s complement form, and the sign bit is 0,
hence the result is a positive quantity. To get the magnitude, just write the decimal
equivalent of resultant bits.
(0110)2=(6)10
So, the result is +6
i.e., (-4) + (-6)=+6, an incorrect result
As, (-4) + (-6)=-10, and to represent -10 we need 5 bits, 1 for the sign, 4bits for the
magnitude.
Conditional Codes Numerical
Consider a register R1 contains a value 10101010 and R2 contains
11110000.What will be the value of carry , zero and overflow flags after the
execution of the instruction
ADD R1,R2// R2 is the destination

R1: 10101010
+(R2): 1 1 1 1 0 0 0 0
11 0 0 1 1 0 1 0

C=1 Z=0

N=1
V=0
Conditional Codes Numerical
Consider a register R1 contains a value 1 1 1 1 0 0 0 0 and R2 contains 0 0 0 1
0 1 0 0 .What will be the value of carry , zero, negative(sign) and overflow
flags after the execution of the instruction
SUB R2,R1// R1 is the destination
R1← [R1] - [R2]
R1← [R1] +( - [R2])
So, we will take the 2’s complement of R2’s content to take the negative of R2’s content.
We will take the 2’s complement of (00010100)

The 2’s complement of (00010100) : 11101100


R1 : 1 1 1 1 0 0 0 0
+(- R2) : 1 1 1 0 1 1 0 0
111011100
So, CF=1, ZF=0, SF=1, OF=0
110111100: -(2’s complement of 111011100)=-00010001=-17
Thank You
Memory Unit
Bindu Agarwalla

1
Some Basic Concepts

Processor Memory
k-bit
address bus
MAR Up to 2k addressable
n-bit locations
data bus
MDR Word length = n bits

Control Lines
(R/W, MFC, etc)

Fig : Connection of memory to the processor


Some Basic concepts
Meaning of Random Access Memory

Byte Addressable memory

Word Addressable memory

Memory Capacity
Some Basic concepts
Measures for the speed of a memory:
memory access time.
memory cycle time.

The time gap between the initiation of an memory operation(read/write) and


the completion of that operation(MFC) is called as the memory access time.

The time gap between the initiation of two consecutive memory operations
is called as the memory cycle time.
Some Basic concepts
◼ An important design issue is to provide a computer system with as large
and fast a memory as possible, within a given cost target.

◼ Several techniques to increase the effective size and speed of the memory:
▪ Cache memory (to increase the effective speed).
▪ Virtual memory (to increase the effective size).
Organization of bit cells in a 16 x 8 Memory chip
b’7 b1 b’1 b0 b’0
b7

w0 w0 w0

M
FF FF e
A0
w1 w1 w1 m
A1 o
Address r
A2 Decoder y

A3 C
e
l
w15 w15 w15
l
s

R/W
Sense/Write Sense/Write Sense/Write
Circuit Circuit Circuit
CS

Data Input/output lines b7 b1 b0


Internal organization of memory chips
Each memory cell can hold one bit of information.

Memory cells are organized in the form of an array.

One row is one memory word.

All cells of a row are connected to a common line, known as the “word
line”.

Word line is connected to the address decoder.

Sense/write circuits are connected to the data input/output lines of the


memory chip.
No of external pins required to connect a memory chip
For 16 x 8 chip
4 address lines
8 data lines
2 (R/W + CS)
2 (Power Supply+GND )
16 (Total )
No of external pins required to connect a memory chip
For 128 x 8 chip
7 address lines
8 data lines
2 (R/W + CS)
2 (Power Supply+GND )
19 (Total )
No of external pins required to connect a memory chip
For 1K x 1 chip
10 address lines
1 data line
2 (R/W + CS)
2 (Power Supply+GND )
15 (Total )
No of external pins required to connect a memory chip
For 64 x 16 chip
6 address lines
16 data line
2 (R/W + CS)
2 (Power Supply+GND )
26 (Total )
Internal organization of 1K x 1 memory chip
5-bit row
address W1
W2
32 x 32
5-bit memory cell
decoder
array
W31 Sense/Write
circuitry

10 bit
address

32-to-1 R/W
output multiplexer
and
input demultiplexer CS

5-bit column Data


address Input/Output
Implementation of a SRAM Cell

Two inverters are cross connected to implement a basic storage element “latch”.

The cell is connected to one word line and two bits lines by transistors T1 and
T2.

When word line is at ground level, the transistors are turned off and the latch
retains its state.
Implementation of a SRAM Cell

Read operation:
1. In order to read state of SRAM cell, the word line is activated to close switches T1
and T2. Sense/Write circuits at the bottom monitor the state of b and b’
2. Sense/Write circuits at the bottom monitor the state of b and b’ and set the output
accordingly.
3. If the cell is in state 1, the signal on bit line b is high and the signal on bit line b’ is
low.
4. The opposite is true if the cell is in state 0.
Implementation of a SRAM Cell

Write operation:
1. The state of the cell is set by placing the appropriate value on bit line b and its
complement on b’, and then activating the word line.

2. This forces the cell into the corresponding state.

3. The required signals on the bit lines are generated by the Sense/Write ckt.
Implementation of a DRAM Cell
Dynamic RAM (DRAM): slow, cheap, and dense memory.

Typical choice for main memory.

Cell Implementation:
1-Transistor cell (pass transistor)
Trench capacitor (stores bit)

Bit is stored as a charge on capacitor. The


charge can be maintained for only tens
of milliseconds.

But the cell is required to store the information for a


much longer time.

Hence, the contents must be refreshed by restoring the capacitor charge to


its full value.
Implementation of a DRAM Cell

Why periodic refreshing is required??

After the transistor is turned off, the


capacitor begins to discharge.

1. This happens due to the capacitor’s own


leakage resistance.

2. The transistor continues to conduct a tiny


amount of current measurd in
picoamperes, after it is turned off.
Implementation of a DRAM Cell
How periodic refresh operation is perfomed??

During a read operation, the transistor in a selected cell is turned on.

A sense amplifier connected to the bit line detects whether the charge stored on the
capacitor is above the threshold value.
If so, it drives the bit line to a full voltage that represents logic value 1. This voltage
recharges the capacitor to the full charge that corresponds to logic value 1.

If the sense amplifier detects that the charge on the capacitor is below the threshold
value , it pulls the the bit line to ground level, which ensures that the capacitor will
have no charge, representing logic value 0.

Hence, reading the contents of the cell automatically refreshes its contents.
SRAM Vs DRAM Cell
Static RAMs (SRAMs):
Consist of circuits that are capable of retaining their state as long as the power is
applied.

Volatile memories, because their contents are lost when power is interrupted.

Access times of static RAMs are in the range of few nanoseconds.


Requires low power to retain bit. As power is consumed, only when the cell is
accessed.

However, the cost is usually high.

Dynamic RAMs (DRAMs):

Do not retain their state indefinitely.


Contents must be periodically refreshed.

Contents are be refreshed while accessing them for reading.


DRAM Refresh Cycles
Refresh cycle is about tens of milliseconds.

Refreshing is done for the entire memory.

Each row is read and written back to restore the charge.

Some of the memory bandwidth is lost to refresh cycles.

Voltag 1 Written Refreshed Refreshed Refreshed


e for 1

Threshol
d voltag
e
0 Stored Refresh Cycle
Voltag Time
e for 0
2M X 8 Memory Design
Each row can store 512 bytes. 12
bits to select a row, and 9 bits to
select a group in a row. Total of 21
bits.

First apply the row address, RAS


signal latches the row address.

Then apply the column address,


CAS signal latches the address.

Timing of the memory unit is


controlled by a specialized unit
which generates RAS and CAS.

This is asynchronous DRAM.


Burst Mode Operation
Block Transfer
Row address is latched and decoded.

A read operation causes all cells in a selected row to be read.

Selected row is latched internally inside the SDRAM chip

Column address is latched and decoded.

Selected column data is placed in the data output register.

Column address is incremented automatically.

Multiple data items are read depending on the block length.

Fast transfer of blocks between memory and cache.

Fast transfer of pages between memory and disk.


SDRAM and DDR SDRAM

SDRAM is Synchronous Dynamic RAM


Added clock to DRAM interface

SDRAM is synchronous with the system clock


Older DRAM technologies were asynchronous

DDR is Double Data Rate SDRAM


Like SDRAM, DDR is synchronous with the system clock, but the difference is
that DDR reads data on both the rising and falling edges of the clock signal.
2M X 32 Using 512K X 8 Chips
2M X 32 Using 512K X 8 Chips
Step 1: Find out , how many smaller size chips are required to meet the
required size:

Divide the total required size/ Size of the smaller chip


2M X 32/ 512K x 8= 221 X 25 / 219 X 23 = 226/ 222= 24= 16 Chips
Step2 : Find, how many smaller size chips need to connect in paralell to meet the
required data size.[ finding the no of columns in the matrix arrangement.

Here, in 2M x 32, 32 bits of data is required, and in 512K x 8 chip, 8 bits of


data can be communicated from one location.

So, 4, 512K x 8 chips need to be connected in parallel, to meet the 32


bits data size.

i.e., we need to connect 512K x 8 chips in a matrix form., where no of columns will
be= size of one location in the bigger size/ size of one location in the smaller size.

No of columns = 32/ 8= 4.
2M X 32 Using 512K X 8 Chips
Step 3: Find out the no of rows:

No of rows x No of Columns = Total no of elements

#rows x 4 =16
#rows =16/4=4

How to connect the address lines:

For 2M x 32 , memory, 21(2M=221) address lines are required, and for 512K x 8,
19(512K=219) address lines are required. So, out of 21 address lines, the 1st 19 lines
will be connected to all the 512K x 8 memory chips.

Then to select a row, out of 4 rows of 512K x 8 memory chips, the higher order 2
address lines (out of 21 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 4 chips of 512K x 8,
will give/ take 8 bits of data each, meeting the required size of 32 bits.
Numerical : Design 8M X 32 bits memory using
512KX8 bits memory chip.
Step 1: Find out , how many smaller size chips are required to meet the
required size:

Divide the total required size/ Size of the smaller chip


8M X 32/ 512K x 8= 223 X 25 / 219 X 23 = 228/ 222= 26= 64 Chips
Step2 : Find, how many smaller size chips need to connect in paralell to meet the
required data size.[ finding the no of columns in the matrix arrangement.]

Here, in 8M x 32, 32 bits of data is required, and in 512K x 8 chip, 8 bits of


data can be communicated from one location.

So, 4, 512K x 8 chips need to be connected in parallel, to meet the 32


bits data size.

i.e., we need to connect 512K x 8 chips in a matrix form., where no of columns will
be= size of one location in the bigger size/ size of one location in the smaller size.

No of columns = 32/ 8= 4.
Numerical : Design 8M X 32 bits memory using
512KX8 bits memory chip.
Step 3: Find out the no of rows:

No of rows x No of Columns = Total no of elements

#rows x 4 =64
#rows =64/4=16

How to connect the address lines:

For 8M x 32 , memory, 23(8M=223) address lines are required, and for 512K x 8,
19(512K=219) address lines are required. So, out of 23 address lines, the 1st 19 lines
will be connected to all the 512K x 8 memory chips.

Then to select a row, out of 16 rows of 512K x 8 memory chips, the higher order 4
address lines (out of 23 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 4 chips of 512K x 8,
will give/ take 8 bits of data each, meeting the required size of 32 bits.
Numericals
A computer employs RAM chips of 256 X 8 . The computer system needs 2K bytes of
RAM . Design the memory module of above configuration
Step 1: Find out , how many smaller size chips are required to meet the required size:
Divide the total required size/ Size of the smaller chip
2K X 8/ 256 x 8= 211 X 23 / 28 X 23 = 23= 8 Chips
Step2 : Find, how many smaller size chips need to connect in paralell to meet the
required data size.[ finding the no of columns in the matrix arrangement.
Here, in 2K x 8, 8 bits of data is required, and in 256 x 8 chip, 8 bits of data
can be communicated from one location.

So, 1, 256 x 8 chip needs to be connected in a column to meet the 8 bits data
size.

No of columns = 1
Numerical : Design 2K X 8 bits memory using 256X8
bits memory chip.
Step 3: Find out the no of rows:

No of rows x No of Columns = Total no of elements

#rows x 1 =8
#rows = 8

How to connect the address lines:

For 2K x 8 , memory, 11(2K=211) address lines are required, and for 256 x 8, 8
(256=28) address lines are required. So, out of 11 address lines, the 1st 8 lines will be
connected to all the 256 x 8 memory chips.

Then to select a row, out of 8 rows of 256 x 8 memory chips, the higher order 3
address lines (out of 11 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 1 chip of 256 x 8,
will give/ take 8 bits of data , meeting the required size of 8 bits.
Numericals
A computer uses RAM chips of 256 X 4 capacity. Design a memory capacity of 1KB by
using available chip.
Step 1: Find out , how many smaller size chips are required to meet the required size:
Divide the total required size/ Size of the smaller chip
1K X 8/ 256 x 4= 210 X 23 / 28 X 22 = 23= 8 Chips
Step2 : Find, how many smaller size chips need to connect in paralell to meet the
required data size.[ finding the no of columns in the matrix arrangement.]
Here, in 1K x 8, 8 bits of data is required, and in 256 x 4 chip, 4 bits of data
can be communicated from one location.

So, 2, 256 x 8 chip needs to be connected in parallel to meet the 8 bits data
size.

i.e., we need to connect 256 x 4 chips in a matrix form., where no of columns will
be= size of one location in the bigger size/ size of one location in the smaller size.

No of columns = 8/4 = 2
Numerical : Design 1K X 8 bits memory using 256X4
bits memory chip.
Step 3: Find out the no of rows:

No of rows x No of Columns = Total no of elements

#rows x 2 =8
#rows = 4

How to connect the address lines:

For 1K x 8 , memory, 10(1K=210) address lines are required, and for 256 x 4, 8
(256=28) address lines are required. So, out of 10 address lines, the 1st 8 lines will be
connected to all the 256 x 4 memory chips.

Then to select a row, out of 8 rows of 256 x 4 memory chips, the higher order 2
address lines (out of 10 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 2 chips of 256 x 4,
will give/ take 4 bits of data each , meeting the required size of 8 bits.
Typical Memory Hierarchy
Increasing
Increasing Processor cost per unit
Registers are at the top of the hierarchy. size
Fastest storage element, present inside the Registers
CPU. But, limited in number. Access
time < 0.5 ns Primary Cache
L1
Level 1 Cache : On chip cache, designed
using SRAM technology. Typical size is
in the range::(8 – 64 KB) and access
time: 1 ns Secondary
Cache L2
L2 Cache : Off chip cache. Typical size is
in the range::(512KB – 8B) and access
time: 3 to 10 ns Main Memory

Main Memory: Designed using SRAM


technology. Typical size is in the range::(8
Magnetic disk
– 16 GB) and access time: 50ns to 100 ns
secondaryMemory
Increasing
Disk Storage: Serial access memory.Typical speed
size is > 200 GB and access time: 5 – 10 ms Fig. Memory Hierarchy
Why Cache Memory is required?

Processor is much faster than the main memory.


❖ As a result, the processor has to spend much of its time waiting while instructions
and data are being fetched from the main memory.
❖ Major obstacle towards achieving good performance.

Speed of the main memory cannot be increased beyond a certain point.

Cache memory is an architectural arrangement which makes the main memory appear
faster to the processor than it really is.

Cache memory is based on the property of computer programs known as “locality


of reference”.
Why Cache Memory has become possible?
Analysis of programs indicates that many instructions in localized areas of a program
are executed repeatedly during some period of time, while the others are accessed
relatively less frequently.

These instructions may be the ones in a loop, nested loop or few procedures calling each
other repeatedly.
This is called “locality of reference”.

Temporal locality of reference:


❖ Recently executed instruction is likely to be executed again very soon.
❖ If an instruction is executed at time instant t, then most likely the same
instruction will be executed at time instant t + ∆t.

Spatial locality of reference:


❖ Instructions with addresses close to a recently instruction are likely to be
executed soon.
❖ If an instruction at address ‘i’ is executed at time instant t, then most likely the s
instruction at address ‘i+1’ will be executed at time instant t + ∆t.
What is a Cache Memory ?
Small and fast (SRAM) memory technology
Stores the subset of instructions & data currently being accessed.

Used to reduce average access time to memory.

Caches exploit temporal locality by …


Keeping recently accessed data closer to the processor.

Caches exploit spatial locality by …


Moving blocks consisting of multiple contiguous words.

Goal is to achieve
Fast speed of cache memory access.
Balance the cost of the memory system.
The Basics of Caches

Mai
Processor Cac
Cache Main
mem
n
he Memory
ory

Processor issues a Read request, a block of words is transferred from the main
memory to the cache, one word at a time.

Subsequent references to the data in this block of words are found in the cache.

At any given time, only some blocks in the main memory are held in the cache.
Mapping functions determine where a main memory block will be placed in the
cache.
When the cache is full, and a block of words needs to be transferred from the main
memory, some block of words in the cache must be replaced. This is determined by
a “replacement algorithm”.
The Basics of Caches: Cache Hit
Existence of a cache is transparent to the processor. The processor issues Read and
Write requests in the same manner.

If the data is in the cache it is called a Read or Write hit.

Read hit:
❖ The data is obtained from the cache.

Write hit:

❖ Cache has a replica of the contents of the main memory.


❖ Contents of the cache and the main memory may be updated simultaneously.
This is the write-through protocol.
❖ Update the contents of the cache, and mark it as updated by setting a bit known as
the dirty bit or modified bit. The contents of the main memory are updated when
this block is replaced. This is write-back or copy-back protocol.
The Basics of Caches: Cache Miss
If the data is not present in the cache, then a Read miss or Write miss occurs.

Read Miss:
1. Block of words containing the requested word is transferred from the memory.
After the block is transferred, the desired word is forwarded to the processor. This
is called load-back.

2. The desired word may also be forwarded to the processor as soon as it is


transferred without waiting for the entire block to be transferred. This is called
load-through or early-restart.
The Basics of Caches: Cache Miss
What happens on a write miss?

1. Write Allocate:
Allocate new block in cache.
Write miss acts like a read miss, block is fetched and updated.

2. No Write Allocate:
Sends data to lower-level memory.
Cache is not modified.

Typically, write back caches use write allocate


Hoping subsequent writes will be captured in the cache

Write-through caches often use no-write allocate


Reasoning: writes must still go to lower level memory
Block Placement
What is a block??

A block is a set of consecutive memory locations content.

When processor generates an address for any read/write operation, first we check
whether that address’s content is present in the cache memory or not.

If yes, then we perform our read/ write operation from the cache memory.

If not, then we bring the block containing the address that we are trying to access for
our read/write operation from the main memory. i.e., the unit of transfer between main
memory and cache is a block.

The main memory is bigger compared to the cache, hence the length of the main
memory address is more compared to the cache memory’s address.

In block placememt/ mapping functions we will see how to access a byte within a
block in the cache memory, for a given main memory address.
Mapping functions
Mapping functions determine how memory blocks are placed in the cache.

Three mapping functions:


1. Direct mapping
2. Associative mapping
3. Set-associative mapping.
Direct-Mapped Cache
A memory address is divided into
Block address: identifies block in memory
Block offset: to access bytes within a block

A block address is further divided into


– Index: used for direct cache access
– Tag: most-significant bits of block address
Index = Block Address mod Cache Blocks

Tag must be stored also inside cache


For block identification

A valid bit is also required to indicate


Whether a cache block is valid or not
Direct-Mapped Cache
Address Length =b+w bits

No of Addressable units=2b+w words or bytes

Block size= line size


=2w words or bytes

No of blocks in mm=2(b+w)/2w
Number of lines in cache= k=2r

size of cache=2r+w words or bytes

size of tag=(b-r)bits =2b/2r


=
no of blocks in mm/ no of blocks in
cache memory
Mapping functions
A simple processor example:

1. Cache consisting of 128 blocks of 16 words each.


2. Total size of cache is 2048 (2K) words.
3. Main memory is addressable by a 16-bit address.
4. Main memory has 64K words.
5. Main memory has 4K blocks of 16 words each.
Direct Mapping
Block j of the main memory maps to

j modulo 128

So, 0 maps to 0, 129 maps to 1.

More than one memory block is mapped onto


the same position in the cache.

May lead to contention for cache blocks even if


the cache is not full.

Resolve the contention by allowing new block to


replace the old block, leading to a trivial
replacement algorithm.
Direct Mapping
Memory address is divided into three fields:

1. Low order 4 bits determine one of the 16


words in a block.

2. When a new block is brought into the cache,


the the next 7 bits determine which cache
block this new block is placed in.

3. High order 5 bits determine which of the


possible 32 blocks is currently present in the
cache. These are tag bits.

Simple to implement but not very flexible.


Problem
A computer system uses 16-bit memory addresses. It has a 2K-byte cache organized in a
direct-mapped manner with 64 bytes per cache block. Assume that the size of each memory
word is 1 byte.

(a)Calculate the number of bits in each of the Tag, Block, and Word fields of the memory
address.
(b)When a program is executed, the processor reads data sequentially from the following word
addresses:

128, 144, 2176, 2180, 128, 2176

All the above addresses are shown in decimal values. Assume that the cache is initially empty.
For each of the above addresses, indicate whether the cache access will result in a hit or a miss.
Problem: Solution
Block size = 64 bytes = 26 bytes = 26 words (since 1 word = 1 byte)
Therefore, Number of bits in the Word field = 6

Cache size = 2K byte = 211 bytes


Number of cache blocks = Cache size / Block size = 211/26 = 25
Therefore, Number of bits in the Block field = 5

Total number of address bits = 16


Therefore, Number of bits in the Tag field = 16 - 6 - 5 = 5

For a given 16-bit address, the 5 most significant bits, represent the Tag, the next 5 bits
represent the Block, and the 6 least significant bits represent the Word.
Problem: Solution
b) The cache is initially empty. Therefore, all the cache blocks are invalid.

Access # 1:
Address = (128)10 = (0000000010000000)2
(Note: Address is shown as a 16-bit number, because the computer uses 16-bit addresses)

For this address, Tag = 00000, Block = 00010, Word = 000000

Since the cache is empty before this access, this will be a cache miss

After this access, Tag field for cache block 00010 is set to 00000
Problem: Solution
Access # 2:
Address = (144)10 = (0000000010010000)2

For this address, Tag = 00000, Block = 00010, Word = 010000

Since tag field for cache block 00010 is 00000 before this access, this will be a cache hit
(because address tag = block tag)
Problem: Solution
Access # 3:
Address = (2176)10 = (0000100010000000)2

For this address, Tag = 00001, Block = 00010, Word = 000000

Since tag field for cache block 00010 is 00000 before this access, this will be a cache
miss (address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00001

Access # 4:
Address = (2180)10 = (0000100010000100)2

For this address, Tag = 00001, Block = 00010, Word = 000100

Since tag field for cache block 00010 is 00001 before this access, this will be a cache hit
(address tag = block tag)
Problem: Solution
Access # 5:
Address = (128)10 = (0000000010000000)2

For this address, Tag = 00000, Block = 00010, Word = 000000

Since tag field for cache block 00010 is 00001 before this access, this will be a cache miss
(address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00000

Access # 6:
Address = (2176)10 = (0000100010000000)2

For this address, Tag = 00001, Block = 00010, Word = 000000

Since tag field for cache block 00010 is 00001 before this access, this will be a cache
miss (address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00001
Example on Cache Placement & Misses
• Consider a small direct-mapped cache with 32 blocks
- Main memory of size 4GB
– Cache is initially empty, Block size = 16 bytes
– The following memory addresses (in decimal) are referenced:
2000, 2004, 2008, 3548, 3552, 3556.
– Map addresses to cache blocks and indicate whether hit or miss
23 5 4
• Solution: Tag Index offset

2000 = 0x7D0 cache index = 0x1D Miss (first access)


2004 = 0x7D4 cache index = 0x1D Hit

2008 = 0x7D8 cache index = 0x1D Hit


3548 = 0xDDC cache index = 0x1D Miss (different tag)
3552 = 0xDE0 cache index = 0x1EMiss (first access)

3556 = 0xDE4 cache index = 0x1EHit


Numericals on Direct Mapping
Direct Mapping Question: Assume a computer has 32 bit addresses. Each block stores 16
words. A direct-mapped cache has 256 blocks. In which block (line) of the cache would we
look for each of the following addresses? Addresses are given in hexadecimal for
convenience.
a. 1A2BC012 b. FFFF00FF c. 12345678 d. C109D532

Block size = 16 words = 24 words


Therefore, Number of bits in the Word field = 4
Number of cache blocks = 28
Therefore, Number of bits in the Block field = 8
Hence, in the 32 bit main memory address, the 1st lower order 4 bits (1st lower order
hexadecimal digit represents the offset of the word inside a block ).
Then the next 8bits ( 2nd and 3rd lower order hexadecimal digit represents the block no
to which the incoming address will be mapped to).

So, in the address 1A2BC012, 01 is the block no in the cache, to which this address will
be mapped to, i.e., we will check block no 01 in the cache for a hit or a miss.
Numericals on Mapping
Direct Mapping Question: Assume a computer has 32 bit addresses. Each block stores 16
words. A direct-mapped cache has 256 blocks. In which block (line) of the cache would we
look for each of the following addresses? Addresses are given in hexadecimal for
convenience.
a. 1A2BC012 b. FFFF00FF c. 12345678 d. C109D532

So, in the address FFFF00FF, 0F is the block no in the cache, to which this address will
be mapped to, i.e., we will check block no 0F in the cache for a hit or a miss.

So, in the address 12345678, 67 is the block no in the cache, to which this address will be
mapped to, i.e., we will check block no 67 in the cache for a hit or a miss.

So, in the address C109D532 , 53 is the block no in the cache, to which this address will
be mapped to, i.e., we will check block no 53 in the cache for a hit or a miss.
Numericals
Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size of main
memory is 128 KB. Find the number of bits in tag field and the size of the tag directory.

Solution:
Assumption: The memory is byte addressable.
The size of the main memory= 128 KB = 217B.
Hence, the no of bits in the main memory address is 17 bits.
Block size = 256 Bytes = 28 Bytes
Therefore, Number of bits in the Word field = 8
# of Blocks in the cache =Size of the cache memory/ Block size
= 16KB / 256B = 214/28 =26
i.e., 6 bits are required to represent a block no in the cache

# of tag bits = total address length - (block field length + word field length )
=17-(6+8)
=3 bits
Numericals
Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size of main
memory is 128 KB. Find the number of bits in tag field and the size of the tag directory.

Solution:

For each block in the cache, tag bits are stored.


Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 64 x 3 bits
= 192 bits
= 24 Bytes.
Numericals
Consider a direct mapped cache of size 256 KB with block size 1 KB. There are 8 bits in
the tag. Find the size of main memory and the size of the tag directory.
Solution:

Assumption: The memory is byte addressable.


Block size = 1K B = 210 Bytes
Therefore, Number of bits in the Word field = 10
# of Blocks in the cache =Size of the cache memory/ Block size
= 256KB / 1KB = 218/210 =28
i.e., 8 bits are required to represent a block no in the cache
# of tag bits = 8 bits
total address length# of tag bits = # of tag bits + block field length + word field length
=8 + 8 + 10 = 26 bits
The size of main memory = 226 bytes = 64MB

Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 256 x 8 bits
= 256 Bytes
Numericals
Consider a direct mapped cache with block size 2 KB. The size of main memory is 64 GB
and there are 10 bits in the tag. Find the size of cache memory and the size of tag directory.
Solution:

Assumption: The memory is byte addressable.


Block size = 2KB = 211 Bytes
Therefore, Number of bits in the Word field = 11
The size of the main memory= 64 GB = 236B.
Hence, the no of bits in the main memory address is 36 bits.

# of tag bits = 10 bits

block field length = total address length - ( # of tag bits+ word field length ) = 36-(10+11)
=15 bits
The size of cache memory = # of blocks x Size of each block = 2 x 2 Bytes =226 B
15 11

=64MB

Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 215 x 10 bits
= 40960 Bytes
Numericals
Consider a machine with a byte addressable main memory of 232 bytes divided into blocks
of size 32 bytes. Assume that a direct mapped cache having 1K cache lines is used with
this machine. What is the size of the tag field ?
Solution:
Assumption: The memory is byte addressable.
Block size = 32 bytes = 25 Bytes
Therefore, Number of bits in the Word field = 5
# of Blocks in the cache =1K =210
i.e., 10 bits are required to represent a block no in the cache
The size of the main memory= 232B.
Hence, the no of bits in the main memory address is 32 bits.

# of tag bits = total address length - (block field length + word field length )
=32-(10+5)
=17 bits
Numericals
An 16 KB direct-mapped write back cache is organized as multiple blocks, each of size 16
bytes. The processor generates 32 bit addresses. The cache controller maintains the tag
information for each cache block comprising of the following-
1 valid bit, 1 modified bit 2 replacement bits and as many bits as the minimum needed
to identify the memory block mapped in the cache.
What is the total size of memory needed at the cache controller to store meta data (tags) for
the cache?
Solution:
Assumption: The memory is byte addressable.
Block size = 16 bytes = 24 Bytes
Therefore, Number of bits in the Word field = 4
# of Blocks in the cache =Size of the cache memory/ Block size
= 16KB / 16B = 214/24 =210
i.e., 10 bits are required to represent a block no in the cache

# of tag bits = total address length - (block field length + word field length )
=32-(10+4)
=18 bits
Numericals
An 16 KB direct-mapped write back cache is organized as multiple blocks, each of size 16
bytes. The processor generates 32 bit addresses. The cache controller maintains the tag
information for each cache block comprising of the following-
1 valid bit, 1 modified bit 2 replacement bits and as many bits as the minimum needed
to identify the memory block mapped in the cache.
What is the total size of memory needed at the cache controller to store meta data (tags) for
the cache?
Solution:
Size of the memory required to store the meta data= # of blocks in the cache x ( tag bits +
1valid bit + 1 modified bit + 2 replacement bits)

= 210 x ( 18+ 1+1 +2) bits


= 210 x 22 bits
= 2816 Bytes
Fully Associative Cache
A block can be placed
anywhere in the cache.

i.e., there is no fixed/specific


position in the cache, for a block
in the main memory.
In this mapping, replacement of a
block is not required in the cache,
until and unless cache is
completely full.

But to find a main memory block


i, in the cache, tag of the incoming
block need to be compared with
all the tags stored in the cache.
Searching time is more.
Set-Associative Cache
Here, the position of a main memory block is not completely fixed and it
cannot occupy any free block in the cache as well.

A main memory block can occupy any of the free blocks from a specific set of
blocks.
Cache memory is organized as a set of blocks. there may be 2 blocks in set/ 4
blocks per set/ m number of blocks per set. The number of blocks present in a
set is called as a way number.
Here a main memory block i will be mapped to a specific set j, where
j= i mod (# no of sets in the cache)

The main memory block i will be mapped to any of the blocks present in the
set j.
To find a main memory block i, in the cache, the tag of block i will be compared
ony with the tags of the blocks present in the set j.
Set-Associative Cache
Replacement of a block is required only when all the blocks in a set is occupied
and an incoming block is mapped into that set.

For direct mapped cache, way number is 1.

For associative mapped cache, way number is the number of blocks in the
cache.

Here, main memory address will be divided into 3 parts: tag, set, word.

To find the bits in the set field, we need to find the no of sets in the cache: no
of blocks in the cache/ way no. Then expressing that result in the powers of 2
and taking the exponent as the length of the set field.

To find the bits in the tag field, we need to find the number of blocks in the main
memory/the no of sets in the cache. Then expressing that result in the powers of 2
and taking the exponent as the length of the tag field.
Set Associative Mapped Cache
Problem
A computer system uses 16-bit memory addresses. It has a 2K-byte cache organized in a 2-wat
set associative manner with 64 bytes per cache block. Assume that the size of each memory
word is 1 byte.

(a)Calculate the number of bits in each of the Tag, set, and Word fields of the memory address.
(b)When a program is executed, the processor reads data sequentially from the following word
addresses:

128, 144, 2176, 2180, 128, 2176

All the above addresses are shown in decimal values. Assume that the cache is initially empty.
For each of the above addresses, indicate whether the cache access will result in a hit or a miss.
Problem: Solution
Block size = 64 bytes = 26 bytes = 26 words (since 1 word = 1 byte)
Therefore, Number of bits in the Word field = 6

Cache size = 2K-byte = 211 bytes


Number of cache blocks = Cache size / Block size = 211/26 = 25
Number of cache sets = total no of Cache blocks / way no = 25/2 = 24

Therefore, Number of bits in the Set field = 4

Total number of address bits = 16


Therefore, Number of bits in the Tag field = 16 - 6 - 4 = 6

For a given 16-bit address, the 6 most significant bits, represent the Tag, the next 4 bits
represent the Set, and the 6 least significant bits represent the Word.
Problem: Solution
b) The cache is initially empty. Therefore, all the cache blocks are invalid.

Access # 1:
Address = (128)10 = (0000000010000000)2
(Note: Address is shown as a 16-bit number, because the computer uses 16-bit addresses)

For this address, Tag = 000000, Set = 0010, Word = 000000

Since the cache is empty before this access, this will be a cache miss

After this access, Tag field for the first block in the cache set 0010 is set to 000000
Problem: Solution

Access # 2:
Address = (144)10 = (0000000010010000)2

For this address, Tag = 000000, Set = 0010, Word = 010000

Since tag field for the first cache block in the set 0010 is 00000 before this access, this
will be a cache hit (because address tag = block tag)
Problem: Solution
Access # 3:
Address = (2176)10 = (0000100010000000)2

For this address, Tag = 000010, Set = 0010, Word = 000000

The tag field for this address does not match the tag field for the first block in set 0010.
The second block in set 0010 is empty. Therefore, this access will be a cache miss.
After this access, Tag field for the second block in set 0010 is set to 000010

Access # 4:
Address = (2180)10 = (0000100010000100)2

For this address, Tag = 000010, Set = 0010, Word = 000100

Since tag field for the 2nd cache block in the set 0010 is 00001 before this access, this
will be a cache hit (address tag = block tag)
Problem: Solution
Access # 5:
Address = (128)10 = (0000000010000000)2

For this address, Tag = 000000, Set = 0010, Word = 000000

The tag field for this address matches the tag field for the first block in set 0010. Therefore,
this access will be a cache hit.

Access # 6:
Address = (2176)10 = (0000100010000000)2

For this address, Tag = 000010, Set = 0010, Word = 000000

The tag field for this address matches the tag field for the second block in set 0010.
Therefore, this access will be a cache hit.
Numericals
Consider a fully associative mapped cache of size 16 KB with block size 256 bytes. The
size of main memory is 128 KB. Find the number of bits in tag field and the size of the tag
directory.
Solution:
Assumption: The memory is byte addressable.
The size of the main memory= 128 KB = 217B.
Hence, the no of bits in the main memory address is 17 bits.
Block size = 256 Bytes = 28 Bytes
Therefore, Number of bits in the Word field = 8
# of tag bits = total address length - word field length
=17- 8
=9 bits

Size of the tag directory = no of blocks in the cache x no of bits in the tag field
# of Blocks in the cache =Size of the cache memory/ Block size
= 16KB / 256B = 214/28 =26
Size of the tag directory = 64 x 9 bits = 72Bytes
Numericals
Consider a fully associative cache of size 256 KB with block size 1 KB. There are 18 bits
in the tag. Find the size of main memory and the size of the tag directory.
Solution:

Assumption: The memory is byte addressable.


Block size = 1K B = 210 Bytes
Therefore, Number of bits in the Word field = 10
# of Blocks in the cache =Size of the cache memory/ Block size
= 256KB / 1KB = 218/210 =28
i.e., 8 bits are required to represent a block no in the cache
# of tag bits = 18 bits
total address length = # of tag bits + word field length
=18 + 10 = 28 bits
The size of main memory = 228 bytes = 256MB

Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 256 x 8 bits
= 256 Bytes
Numericals
Consider a fully associative mapped cache with block size 4 KB. The size of main
memory is 16 GB. Find the number of bits in tag.
Solution:

Assumption: The memory is byte addressable.


Block size = 4KB = 212 Bytes
Therefore, Number of bits in the Word field = 12
The size of the main memory= 16 GB = 234B.
Hence, the no of bits in the main memory address is 34 bits.

# of tag bits = total address length - word field length = 34 - 12


=22 bits
Numericals
Cache/Memory Layout: A computer has an 8 GByte memory with 64 bit word sizes. Each
block of memory stores 16 words. The computer has a direct-mapped cache of 128 blocks.
The computer uses word level addressing. What is the address format? If we change the
cache to a 4-way set associative cache, what is the new address format?
Solution:
Assumption: The memory is WORD addressable.
Block size = 16 words = 24 words
Therefore, Number of bits in the Word field = 4
The size of the main memory= 8 GB = 233B.
Hence, the no of bits in the main memory address is 33 bits.

# of blocks in the cache = 128 = 27

Main memory address : 33 bits.


word field : 4 bits
block field : 7 bits
Tag field : 33- (4+7) bits = 22 bits
Numericals
Cache/Memory Layout: A computer has an 8 GByte memory with 64 bit word sizes. Each
block of memory stores 16 words. The computer has a direct-mapped cache of 128 blocks.
The computer uses word level addressing. What is the address format? If we change the
cache to a 4-way set associative cache, what is the new address format?
Solution: Part 2: For 4-way set associative cache
Assumption: The memory is WORD addressable.
Block size = 16 words = 24 words
Therefore, Number of bits in the Word field = 4
The size of the main memory= 8 GB = 233B.
Hence, the no of bits in the main memory address is 33 bits.

# of blocks in the cache = 128 = 27


# of sets in the cache = 27 / 4 = 25

Main memory address : 33 bits.


word field : 4 bits

set field : 5 bits


Tag field : 33- (4+5) bits = 24 bits
Numericals
A two-way set associative cache memory uses block of 4 words. The
cache can have a total of 2048 words from main memory. The main
memory size is 128K X 32.
i)Draw the format of main memory address.ii)What is the size of cache with tag bits
Solution:
Assumption: The memory is WORD addressable.
Block size = 4 words = 22 words
Therefore, Number of bits in the Word field = 2
The size of the main memory= 128K x 32 = 217words.
Hence, the no of bits in the main memory address is 17 bits.

# of blocks in the cache = No of words / block size = 211 / 22 = 29


# of sets in the cache = 29 / 2 = 28
The size of cache with tag bits = No of blocks x
Main memory address : 17 bits. no of tag bits x block size
= 29 x 7 x 4 words
word field : 2 bits
=14336 words
set field : 8 bits
Tag field : 17- (2+8) bits = 7 bits
Numericals
Consider a 4-way set associative mapped cache of size 16 KB with block size 512 bytes.
The size of main memory is 256 KB. Find the number of bits in tag field and the size of the
tag directory.
Solution:
Assumption: The memory is byte addressable.
Block size = 512 B = 29 Bytes
Therefore, Number of bits in the Word field = 9
# of Blocks in the cache =Size of the cache memory/ Block size
= 16KB / 512B = 214/29 =25

# of Sets in the cache =No of blocks in the cache memory/ Way no(# of blocks in a set)
= 25/4= 23

The size of main memory = 256KB = 218 bytes

total address length = # of tag bits + # of set bits + word field length
# of tag bits = 18 -( 3+9) = 6 bits
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 32 x 6 bits
= 24 Bytes
Numericals
Consider a 4-way set associative mapped cache of size 512 KB with block size 1 KB.
There are 9 bits in the tag. Find the size of main memory and the size of the tag directory

Solution:
Assumption: The memory is byte addressable.
Block size = 1KB = 210 Bytes
Therefore, Number of bits in the Word field = 10
# of Blocks in the cache =Size of the cache memory/ Block size
= 512KB / 1KB = 219/210 =29

# of Sets in the cache =No of blocks in the cache memory/ Way no(# of blocks in a set)
= 29/4= 27

total address length = # of tag bits + # of set bits + word field length
= 9+ 7+ 10 = 26 bits
The size of main memory = 226 bytes = 64MB

Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 512 x 9 bits
= 576 Bytes
Numericals
Consider a 4-way set associative mapped cache with block size 4 KB. The size of main
memory is 16 GB and there are 10 bits in the tag. Find the size of cache memory and the
size of the tag directory.
Solution:
Assumption: The memory is byte addressable.
Block size = 4 KB = 212 Bytes
Therefore, Number of bits in the Word field = 12
The size of main memory = 16GB = 234 bytes

total address length = # of tag bits + # of set bits + word field length
# of set bits = 34 - (10 + 12) = 12 bits

Size of the cache memory = # of sets in the cache x way no x size of a block
= 212 x 4 x 212 Bytes = 226 Bytes = 64MB

Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 212 x 4 x 10 bits
= 20,480 Bytes
Numericals on Mapping
A cache consists of a total of 128 blocks. The main memory contains 2K
blocks, each consisting of 32 words.
( I )How many bits are there in each of the TAG, BLOCK and WORD
field in case of direct mapping?
( ii )How many bits are there in each of the TAG, SET, and WORD
field in case of 4-way set-associative mapping?

Solution:
Assumption: The memory is word addressable.
Block size = 32 words = 25 words
Therefore, Number of bits in the Word field = 5
# of Blocks in the cache =128 =27
i.e., 7 bits are required to represent a block no in the cache
# of Blocks in the main memory =2K =211

To find the tag bits, # of Blocks in the main memory/ # of Blocks in the cache
= 211/ 27 = 24
The no of tag bits = 4
Numericals on Mapping
A cache consists of a total of 128 blocks. The main memory contains 2K
blocks, each consisting of 32 words.
( I )How many bits are there in each of the TAG, BLOCK and WORD
field in case of direct mapping?
( ii )How many bits are there in each of the TAG, SET, and WORD
field in case of 4-way set-associative mapping?

Solution: 2nd Part


Assumption: The memory is word addressable.
Block size = 32 words = 25 words
Therefore, Number of bits in the Word field = 5
# of Blocks in the cache =128 =27
# of sets in the cache =# of Blocks in the cache/ way no = 27 /4 = 25
i.e., 5 bits are required to represent a set no in the cache
# of Blocks in the main memory =2K =211
To find the tag bits, # of Blocks in the main memory/ # of sets in the cache
= 211/ 25 = 26
The no of tag bits = 6
Numericals
Consider a 16-way set associative mapped cache. The size of cache memory is 512 KB and
there are 11 bits in the tag. Find the size of main memory.

Solution:

Assumption: The memory is byte addressable.


Let the no of sets in the cache = 2x Bytes
Let the size of a block = 2y Bytes
Size of the cache memory = No of sets x way no x block size
219 = 2x x 24 x 2y
=> x+y = 15......................<1>

Length of the main memory address = No of tag bits + bits in the set field + no of bits in
word field
= 11+ x+y
Now replacing the value of x+y from equation <1 > , we get
Length of the main memory address = 11+ 15 = 26 bits
Size of the main memory = 226 Bytes = 64MB
Numericals
Consider a 4-way set associative mapped cache. The size of main memory is 64 MB and
there are 11 bits in the tag. Find the size of cache memory.

Solution:

Assumption: The memory is byte addressable.


Let the no of sets in the cache = 2x Bytes
Let the size of a block = 2y Bytes

Size of the main memory = 64MB = 226 B

total address length = # of tag bits + # of set bits + word field length
26 = 10 + x+ y
=> x+y = 15................................<1>

Size of the cache memory = No of sets x way no x block size


= 2x x 22 x 2y
= (2+x+y)
2

Now replacing the value of x+y from equation <1 > , we get
Size of the cache memory =2(2+15)Bytes
= 217 Bytes = 128 KB
Numericals
A computer has a 256 KB, 4-way set associative, write back data cache with block size of
32 bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag
directory entry contains in addition to address tag, 2 valid bits, 1 modified bit and 1
replacement bit. What is the width if tag field and the size of the tag directory?
Solution:
Assumption: The memory is byte addressable.
Block size = 32B = 25 Bytes
Therefore, Number of bits in the Word field = 5
# of Sets in the cache =No of blocks in the cache memory/ Way no(# of blocks in a set)
= (Size of the cache memory/ Block size ) / Way no
= (256 KB / 32 B ) / 4
= (218/25 ) /22 = 213/22 =211
total address length = # of tag bits + # of set bits + word field length
# of TAG bits = 32 - (11 + 5) = 16 bits
Width of tag field = address tag bits + 2 valid bits + 1 modified bit + 1 replacement bit
= 16 + 2 + 1 +1 = 20bits
Size of the tag directory = no of blocks in the cache x no of bits in the tag field.
= 213 x 20 bits
= 20,480 Bytes
Numericals
Consider a direct mapped cache with 8 cache blocks (0-7). If the memory block requests
are in the order-
3, 5, 2, 8, 0, 6, 3, 9, 16, 20, 17, 25, 18, 30, 24, 2, 63, 5, 82, 17, 24
Which of the following memory blocks will not be in the cache at the end of the sequence?
3 18 20 30 Also, calculate the hit ratio and miss ratio.
Numericals
Consider a fully associative cache with 8 cache blocks (0-7). The memory block requests
are in the order-
4, 3, 25, 8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7
If LRU replacement policy is used, which cache block will have memory block 7?
Also, calculate the hit ratio and miss ratio.
Numericals
Consider a 4-way set associative mapping with 16 cache blocks. The memory block
requests are in the order-

0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155

If LRU replacement policy is used, which cache block will not be present in the cache?

3
8
129
216

Also, calculate the hit ratio and miss ratio.


Cache Coherence Problem
A bit called as “valid bit” is provided for each block.

If the block contains valid data, then the bit is set to 1, else it is 0.

Valid bits are set to 0, when the power is just turned on.

When a block is loaded into the cache for the first time, the valid bit is set to 1.

Data transfers between main memory and disk occur directly bypassing the cache.
[DMA Technique]
Case 1: from Disk to Main memory
When a new block is transferred from disk to the main memory block ‘i’ using
DMA technique, whose one copy also exists in the cache, then due to this DMA
transfer, the cache copy will become ‘stale’ . As, now the main memory block ‘i’
contains some different data.
So , cache memory block ‘j’ corresponding to main memory block ‘i’ need to be
invalidated, i.e., valid bit is cleared to 0 for the cache block j..
Cache Coherence Problem
Case 2: What happens if the data is transferred from the main memory to the disk
and the write-back protocol is being used?

In this case, the data in the cache may have been changed and is indicated by the dirty
bit.

The copies of the data in the cache, and the main memory are different. This is called the
cache coherence problem.

One option is to force a write-back before the main memory is updated from the disk.
Interleaving
Divides the memory system into a number of memory modules. Each module has its
own address buffer register (ABR) and data buffer register (DBR).

Arranges addressing so that successive words in the address space are placed in
different modules.

When requests for memory access involve consecutive addresses, the access will be to
different modules.

Since parallel access to these modules is possible, the average rate of fetching words
from the Main Memory can be increased.
000
001
010
011
100
101
110
111
Methods of address layouts

CASE 1: Consecutive words are placed in a module.

High-order k bits of a memory address determine the module.


Low-order m bits of a memory address determine the word within a module.
When a block of words is transferred from main memory to cache, only one module
is busy at a time.
Methods of address layouts

Case 2: Consecutive words are located in consecutive modules.

Lower-order k bits of a memory address determine the module.


High-order m bits of a memory address determine the word within a module.
While transferring a block of data, several memory modules can be kept busy at the
same time.
Numericals
Find the block transfer time of one interleaved memory where the modules are divided to
accommodate even and odd numbered words of blocks, each block contains 4 words, the
address transfer time is 2ns, 1st word access time is 4ns, consecutive word access time is
3ns and data transfer time is 1ns. Explain the answer with the diagram of the above
interleaved memory
Numericals
Explain the technique of memory interleaving. Consider a memory of 8 words per
block. If 2 clock cycle are required to transfer address from CPU to main memory and 8
clock cycle to access the 1st word and 4 clock cycle each for consecutive words and 1
clock cycle for transferring the word from memory to cache. Then calculate the total clock
cycle required to transfer the block with inter leaving and without interleaving if the
number of modules is four.
Virtual Memory
Bindu Agarwalla

10
Introduction to Virtual Memory
An important challenge in the design of a computer system is to provide a large, fast
memory system at an affordable cost.

Cache memories were developed to increase the effective speed of the memory system.

Virtual memory is an architectural solution to increase the effective size of the memory
system.

Large programs that cannot fit completely into the main memory have their parts stored
on secondary storage devices such as magnetic disks.
▪ Pieces of programs must be transferred to the main memory from secondary storage
before they can be executed.

When a new piece of a program is to be transferred to the main memory, and the
main memory is full, then some other piece in the main memory must be replaced.
▪ Recall this is very similar to what we studied in case of cache memories.
Introduction to Virtual Memory
Operating system automatically transfers data between the main memory and secondary
storage.
▪ Application programmer need not be concerned with this transfer.
▪ Also, application programmer does not need to be aware of the limitations imposed
by the available physical memory.

Techniques that automatically move program and data between main memory and
secondary storage when they are required for execution are called virtual-memory
techniques.

Programs and processors reference an instruction or data independent of the size of the
main memory.

Processor issues binary addresses for instructions and data.


▪ These binary addresses are called logical or virtual addresses.
Introduction to Virtual Memory
Virtual addresses are translated into physical addresses by a combination of hardware and
software subsystems.
▪ If virtual address refers to a part of the program that is currently in the main
memory, it is accessed immediately.
▪ If the address refers to a part of the program that is not currently in the main
memory, it is first transferred to the main memory before it can be used.
Virtual Memory Organization
Memory management unit (MMU)
translates virtual addresses into physical
addresses.

If the desired data or instructions are in


the main memory they are fetched as
described previously.

If the desired data or instructions are not


in the main memory, they must be
transferred from secondary storage to
the main memory.

MMU causes the operating system to bring


the data from the secondary storage into the
main memory.

Fig. Virtual Memory Organization


Virtual Memory: Part 2
Bindu Agarwalla

10
Paging
The concept of virtual memory is implemented using the concept of paging.

Paging is a concept of non- contiguous memory allocation

The whole memory is divided into fixed blocks called as frames.

The logical memory is divided into same size blocks called as Pages.

When a process is executed, its pages are loaded into the available memory
frames. The pages belonging to the process can be stored in any of the free frames,
they need not have to be contiguous.

The unit of allocation is frame.


Paging Example
Paging ..
• A process consists of a no of pages and inside one page, the required data
will be found.

• To access an element, every address generated by CPU is divided into two


parts
• Page Number (p)
• Page Offset (d)

A particular page number (p) can be allocated any frames (f) in the main
memory.
Paging ..
• As process consists of many no of pages, each of them will be scattered
thoughout the available frames in the main memory.

A data structure “Page Table” is maintained to store which frame (f) is


allocated to a page (p).

The index into the Page table indicates the page number and entry of the
index indicates the frame no allocated to the page.
Address Translation: VA to PA
Paging ..
To generate a PA for a given LA , the page number (p) is indexed into the page table
(PT).

The page table base register (PTBR) contains the base address of the PT.

PTBR's content is combined with the page number (p) to get the corresponding entry
into the PT.

Let the entry is 'f'. So 'f' is the frame number for the given page no (p).

The frame no 'f' is combined with the offset 'd' to get the actual page address (PA).

PA= mem[[PTBR] + p ] * Size of a page/frame + offset


Example: LA to PA
Given, LA = 6, Page size= 4 Bytes
p= LA / page size ( integer division )
= 6/4 = 1
d= LA % page size = 6 % 4 = 2
PA = PT[1] X 4 + 2

PA =2 X 4 + 2 = 10

In Binary:

Given, LA = 6, Page size= 4 Bytes

LA = 6 = 110

P = 1, d=01
PA = PT[1] 10
= 1010 [As PT[1]=2=10]
Virtual Memory: TLB
Bindu Agarwalla

11
Paging with TLB
With Paging to access a byte from memory, we need to refer to the memory twice.
Firstly for the page table to get the frame number.
Secondly to get the byte from the generated physical address.

To overcome this slow down, a special small first lookup hardware cache called a
translation-look-aside-buffer (TLB).

Each entry in the TLB consists of two parts.


A key or tag
A value
Use of TLB in Paging

118
Paging with TLB
When an associative memory is presented with an item, the item is compared with all the
keys simultaneously.

If the item is found, the corresponding value field is returned.

The TLB contains only a few of PT entries.

When a LA is generated by the CPU, its page no is presented to the TLB.


Paging with TLB
If the page no is found, its frame no is immediately retrieved and is used in the memory.

If the page no is not found, a memory reference to the PT is made.

Then the frame no is obtained, it is used to generate the PA as well as page no and
frame no information is entered into the TLB for quick future reference.
121
Use of TLB in Paging..

If TLB is maintained process wise, then for every context switch, it has to be
flushed.

To avoid this , some TLBs store address space identifier (ASIDs) in each
TLB entry.

An ASID uniquely identifies each process.


Practice Questions
Assuming a IKB page size, what are the page nos and
offsets for the following address references [ provided
as decimal nos ]

A. 3085 B. 42095 C. 215201 D. 650000 E.


2000001
Ans:
For page no [P]: LA/Page size
= 3085/1024
=3
For offset [d] = LA % Page size
= 3085 % 1024
=13
Practice Questions
Consider a LA space of 256 pages with a 4KB page size, mapped onto a PM
of 64 frames.
a) How many bits are required in the logical address?

Answer : LA = p + d
To get 'p' express no of pages in powers of 2 and the exponent will
be 'p'
256 = 28
so, p = 8 bits
To get d express the page size in the powers of 2 and the exponent
will be d
4KB = 212 B
so , d = 12 bits

Therefore LA = p + d = (8 + 12) bits = 20 bits


Practice Questions
Consider a LA space of 256 pages with a 4KB page size, mapped onto a PM
of 64 frames.
b) How many bits are required in the Physical address ?

Answer : PA = f + d
To get 'f' express no of frames in powers of 2 and the exponent will be 'f'
64 = 26
so, f = 6
d will be same for LA and PA
d= 12
Therefore PA = p + d = (6 + 12) bits
= 18 bits
Practice Questions
Assume a program consists of 8 pages and a computer has 16 frames of
memory. A page consists of 4096 words and memory is word addressable.
Currently, page 0 is in frame 2, page 4 is in frame 15, page 6 is in frame 5
and page 7 is in frame 9. No other pages are in memory. Translate the
memory addresses below.
a.111000011110000 b.000000000000000

Answer : As 8 pages are there, so 3 bits are required to represent a page


number.
As a page consists of 4096(212) words, hence 12 bits are required to represent
a page number.
111000011110000: here page no is 7, and it is found in frame no 9 (1001)
So, the PA = FramenoOffset = 1001000011110000
000000000000000: here page no is 0, and it is found in frame no 2 (0010)

So, the PA = FramenoOffset = 0010000011110000


Numericals on Virtual Memory
Bindu Agarwalla

12
127
Effective Access Time using TLB
• Hit Ratio : The percentage % of times that the requested page number is
found in the TLB.
• EMAT = h(TLB + MA) + (1-h)(TLB + 2 x MA)
• When we find a page in the TLB, then access time will be the sum of the
time used to access the TLB and the time taken to access the data from the
memory. This is represented in the first part of the equation.
• When we fail to find or miss the page in TLB, then the total time will be
sum of the time to used to access the TLB (miss) + time used to access the
PT time used to access the data item.
128
Problem
Memory access time = 50 ns
Hit Ratio = 75%
TLB access time= 2ns

What is the effective memory access time if TLB is used ?

EMAT = h(TLB + MA) + (1-h)(TLB + 2 x MA)

As Hit Ratio is 75%, we have h = 0.75

Therefore ,

EMAT = .75 ( 2 + 50 ) + (1-0.75)( 2 + 2 x 50 ) ns


= .75 ( 52) + (0.25)(102) ns
= 64.50 ns
129
Problem
Memory access time = 50 ns
Hit Ratio = 75%
TLB access time 2ns

What is the effective memory access time if TLB is not used ?

EMAT = h(TLB + MA) + (1-h)(TLB + 2 x MA)

As TLB is not used, there h(TLB + MA) + (1-h)(TLB is not required

Therefore ,

EMAT = 2 x MA
= 2 x 50 ns
= 100 ns
130
Problem
The size of virtual memory is 256G Bytes and the physical memory is 4G
Bytes. The page size is 8M Bytes. What would be the size of page table
assuming 6 bits are used as control bits in the page table.
Solution:
Virtual Memory = 256G Bytes = 28 x 230 Bytes = 238 bytes
Logical address = 38 bits
Page size = 8M Bytes = 223 Bytes
Page no = (38-23) = 15 bits Offset = 23bits
Physical memory = 4G Bytes = 232 Bytes
Frame no = (32-23) = 9 bits
In a page table , page number of entries will be there and each entry consists of
frame no (9 bits) + control bits (6 bits, given )

Size of the page table = no of entries x size of each entry


Size of the page table = 223 x ( 9+6) bits = 15 x 223 bits
131
Hit Rate and Miss Penalty
Hit Rate= Number of Successful attempts/Number of total attempts

Miss Rate= Number of Unsuccessful attempts/Number of total attempts

Miss Penalty= time required to bring desired information to cache

Average Access Time= tave= hc +(1-h)M


132
PROBLEM
What is the hit ratio of a cache memory if cache memory access time
is 30ns and main memory access time is 150ns and average access
time is 42ns?

Can it be possible to have 100% hit in a cache, justify your answer.

What are the write policies of cache memory? Explain. In a cache


organization if the cache memory has an access time of 8nsec and hit rate as
0.98, then find out Average Memory Access time (AMAT) for the whole
arrangement. Assume the access time for the main memory is 1 .0 msec
133
PROBLEM
Consider a computer C1, with no cache memory , that takes 10 cc to read from
memory.
Consider another computer C2, with cache memory and interleaved main memory,
which takes 17cc to transfer a block from memory to cache, on a cache miss.
It is found that 30% instructions executed are data reference instructions. hit ratio
of instruction cache is 95% and data cache is 90%. cache memory access time is
1cc for both the caches.
Find the improvement in performance due to use of caches over the non
cached one.
134
PROBLEM
Consider a two level memory system such that the level-1 having hit ratio
75%. The level-1 memory is 20 times faster than level-2 memory. The
average access time of level-1 memory is 52ns. If the average access time
of level-2 memory is changed or increased by 20% of 52ns. Compute the
followings.
( i ) What is the access time of level-1 memory?
( ii )What is the new hit ratio?
( iii )What is the percentage of change in hit ratio?

h1=0.75 Tavg1=52ns

Tavg2=Tavg1+20% of 52=52+ 10.4=62.4ns

t2=20t1

Tavg=0.75×t1 + 0.25 × (20×t1)


t1=9.04 ns
t2= 180.8 ns [20 x t1]
PROBLEM
a.Consider a two level memory system such that the level-1 having hit
ratio 75%. The level-1 memory is 20 times faster than level-2 memory.
The average access time of level-1 memory is 52ns. If the average access
time of level-2 memory is changed or increased by 20% of 52ns. Compute
the followings.
( i ) What is the access time of level-1 memory?
( ii )What is the new hit ratio?
( iii )What is the percentage of change in hit ratio?

Tavg2=hN × t1+(1- hN ) × t2

62.4= hN ×9.04+ (1- hN ) × 180.8

hN =

Percentage of change in hit ratio= (old-new)/old x 100%

135
2-Level Cache

tave= h1t1 +(1-h1)[h2(tB+t1)+(1-h2)(tB’+tB+t1)

136
PROBLEM
Consider a three level memory system with access times per word 20ns,
40ns, 100ns. Hit ratios are 0.7, 0.8 and 1 respectively. If the referred word
is not available in level1 get the two word block from level2 to level1 and
supply the desired word to the processor. If it is not available in level2
then get a 4 word block from level3 to level2 and transfer the associated
block from level2 to level1. Handover the desired word to processor from
level1. what is the average access time?
tave= h1t1 +(1-h1)[h2(tB+t1)+(1-h2)(tB’+tB+t1)]

h1=.7, h2=.8

t1= 20ns, tB= 40ns x 2= 80ns, tB’ = 100ns x 4 =400ns

tave= 0.7x20 +

137
Thank You
Multiplication of Signed Integers
Booth’s Algorithm

Bindu Agarwalla

1
Booth’s Algorithm
It treats both +ve and -ve multiplier in the uniform way.

M X 0011110[30]=M X ( 25 - 21) [30=32-2=25 -21]

We know that, multiplying something by 2i is equivalent to shifting the number left


by i times.

We know that, multiplying something by -2i is equivalent to shifting the 2’s


complement of the number left by i times.
Booth’s Algorithm
In general, in the Booth scheme,

-1 times the shifted multiplicand is selected when moving from 0 to 1 in the


multiplier.
+1 times the shifted multiplicand is selected when moving from 1 to 0, as the
multiplier is scanned from right to left.

10 0 1 1 1 0 0 1 0 1 0 1 0 0
-1 0 +1 0 0 -1 0 +1 -1 +1 -1 +1 -1 0 0
Booth’s Algorithm

Booth multiplier recoding table.


-13 x 11 using Booth’s Algorithm
+13= 01101 +11= 01011
-13= 10011
+11= +1-1+10-1
43 2 1 0
+2 -2 +2 -20 =16-8+4-1=11
4 3 2

1 0 01 1
+1 -1 +1 0 -1
000000 1 1 0 1
000000 0 0 0
1111 00 1 1
0001 10 1
1100 11
11101 11 0 0 0 1

Ans is = -(2’comp of (1 1 0 1 1 1 0 0 0 1)
= - 10001111 = -143
-7 x -11 using Booth’s Algorithm
+7= 00111 +11= 01011
-7= 11001
-11= 10101=-1+1-1+1-1

1 1 0 01
-1 +1 -1+1-1
000000 0 1 1 1
1111 1 1 0 0 1
0000 01 1 1
1111 00 1
0001 11
0001 00 1 1 0 1

Ans is = 77
Multiplication of Signed Integers
Booth’s Algorithm[Method 2]

Bindu Agarwalla

7
Flowchart for Booth’s Algorithm
Booth’s Algorithm (-7 x 3) [+7=0111 -7=1001]
Register Multiplier Multiplicand Operation Remark
Register Register
A Q Q-1 M
0000 0011 0 1001 Initially

0111 0011 0 A=A-M=A+ 2s(M)


1st Cycle
0011 1001 1 Shift A, Q, Q-1 right

0011 1001 1 No add/sub


2nd Cycle
0001 1100 1 Shift A, Q, Q-1 right

1010 1100 1 A=A+M


0110 0 3rd Cycle
1101 Shift A, Q, Q-1 right 3rd Cycle

1101 0110 0 No add/sub


1110 4th Cycle
1011 0 Shift A, Q, Q-1 right
Booth’s Algorithm (14 x -7) [+14=01110 -14=10010]
Register Multiplier Multiplicand Operation Remark
Register Register
A Q Q-1 M
00000 11001 0 01110 Initially Count=5

10010 11001 0 A=A-M=A+ 2s(M)


Count=4
11001 01100 1 AShift A, Q, Q-1 right

100111 01100 1 A=A+M


Count=3
00011 10110 0 AShift A, Q, Q-1 right

00011 10110 0 No add/sub


11011 0 Count=2
00001 AShift A, Q, Q-1 right3rd Cycle

10011 11011 0 A=A-M=A+ 2s(M)


11001 Count=1
11101 1 AShift A, Q, Q-1 right
Booth’s Algorithm (14 x -7) [+14=01110 -14=10010]
Register Multiplier Multiplicand Operation Remark
Register Register
A Q Q-1 M
11001 11101 1 No add/sub Count=0
11100 11110 1 AShift A, Q, Q-1 right
Booth’s Algorithm (-15 x -5) [+15=01111 -15=10001]
Register Multiplier Multiplicand Operation Remark
Register Register
A Q Q-1 M +5= 00101(-5= 11011)
00000 11011 0 10001 Initially Count=5

01111 11011 0 A=A-M=A+ 2s(M)


Count=4
00111 11101 1 AShift A, Q, Q-1 right

00111 11101 1 No add/sub


Count=3
00011 11110 1 AShift A, Q, Q-1 right

10100 11110 1 A=A+ M


01111 0 Count=2
11010 AShift A, Q, Q-1 right3rd Cycle

101001 01111 0 A=A-M=A+ 2s(M)


00100 Count=1
10111 1 AShift A, Q, Q-1 right
Booth’s Algorithm (-15 x -5) [+15=01111 -15=10001]
Register Multiplier Multiplicand Operation Remark
Register Register
A Q Q-1 M
00100 10111 1 No add/sub Count=0
00010 01011 1 AShift A, Q, Q-1 right
Analysis of Booth’s Algorithm

10 0 1 1 1 0 0 1 0 1 0 1 0 0

-1 0 +1 0 0 -1 0 +1 -1 +1 -1 +1 -1 0 0
The general case.

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
-1 +1 -1+1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1-1
The Worst case.

1 1 1 1 1 0 0 0 0 0 1 1 1 1 1
0 0 0 0 -1 0 0 0 0 +1 0 0 0 0 -1

The Best case.


Bit Pair Recoding of Multipliers
11 1 0 1 0 0
0 0 -1 +1 -1 0
0 -1 -2

Bit Pair recoding halves the maximum number of summands .


Bit-Pair Recoding of Multipliers
Bit-Pair Recoding of Multipliers
+11 x -9
+11 = 01011, -11= 10101
+9 = 01001 0 1 0 1 1
-1 +2 -1
-9 = 10111
1 1 1 1 1 1 0 1 0 1
Bit pair recorded multiplier will be 0 0 0 1 0 1 1 0
1 1 01 0 1
11 0 1 1 1 0 1 1 1 1 0 0 1 1 1 01
-1 +2 -1
Ans. 2’s comp(1 1 1 0 0 1 1 1 0 1 )
=000 1100011
= -99
X (-2)= Take the 2’s comp n then shift
left by 1 position
X (-1)= Take the 2’s comp of the
operand
X (2)= Shift the operand by 1 bit
position to the left
Bit-Pair Recoding of Multipliers
+18 x -11
+18 = 010010, -18= 101110
+11 = 001011 0 1 0 0 1 0
-1 +1 +1
-11 = 110101
0 0 0 0 0 0 0 1 0 0 1 0
Bit pair recorded multiplier will be 0 0 00 0 1 0 0 10
1 1 10 1 1 1 0
1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0
-1 +1 +1
Ans. 2’s comp(1 1 1 1 0 0 1 1 1 0 1 0)
=000011000110
= 198
Restoring Division
Initialization:
A=0
Q= Dividend
M= Divisor
n= No of bits in the operand
Note: Both Q and M are positive integers represented using equal no of bits.
In A one extra bit is considered to keep track of the sign of the result of subtraction
operation.
Step 1: Shift A and Q left one binary position.
Step 2: Subtract M from A, and place the answer back in A

If the sign of A is 1, set q0 to 0 and add M back to A (restore A); otherwise, set
q0 to 1
Repeat these steps n times.
Restoring Division Example (15 / 5)
A Q
Initially 0 0000 1111
M 0 0101 -M= 1 1 0 1 1
ShiftL 0 0001 111

Subtract 1 1011 111


1 1100 111 0
Restore 0 0101

10 0001
ShiftL 0 0011 11 0
Subtract 1 1011
1 1110 11 0 0
Restore 0 0101
10 0011
ShiftL 0 0111 1 0 0
Restoring Division Example (15 / 5)

A Q
0 0111 1 0 0

Subtract 1 1011 1 0 0 1
10 0010

ShiftL 0 0101 0 0 1
1 1011
10 0000 0 0 1 1
Non restoring Division
In case of restoring division, after unsuccessful division, M is added to A then
shifted to left and M is subtracted from it.

A+M
2(A+M)
2(A+M) - M
2A+2M - M = 2A +M

i.e., Shift A to the left and ADD M directly to it.


Non restoring Division
Initialization:
A=0
Q= Dividend
M= Divisor
n= No of bits in the operand
Note: Both Q and M are positive integers represented using equal no of bits.
In A one extra bit is considered to keep track of the sign of the result of subtraction
operation.

Repeat these step <1> n times:


Step 1: If the sign of A is 0, shift A and Q left one bit position and subtract M from A;
otherwise, shift A and Q left and add M to A.
Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
Step 2:If the sign of A is 1, add M to A
Non restoring Division Example (15 / 5)
A Q
Initially 0 0000 1111
M 0 0101 -M= 1 1 0 1 1
Quotient= 0011=3
ShiftL 0 0001 111
Remainder= 0000=0
Subtract 1 1011 111
1 1100 111 0
ShiftL 1 1001 11 0
Add 0 0101
1 1110 11 0 0
ShiftL 1 1101 1 0 0
Add 0 0101
10 0010 1 0 0 1
ShiftL 0 010 1 0 0 1
Subtract 1 1011
10 0 0 0 0 0 0 11
Floating Point Numbers
IEEE Number Representation

25
Why Floating Point Numbers?
The maximum value that can be represented using 32 bits is 4,294,967,295, is an
integer equal to 232 − 1.

For scientific calculations, sometimes, we need very big number like


6.0247x 1023 or a very small nubmer like 9.1 x 10-31

But for representing such numbers 32 bits are not sufficient enough, we need larger
number of bits in our conventional methods.

Solution:
IEEE 754 Standard: Single Precision(32 bits)
Double Precision(64 bits)
Introduction to Basic Terms
For a number 6.0247 x 1023
It consists of 3 components:

a. Mantissa: [Significant Digits] 6.0247


b. base: 10
c. Exponent : 23
d. Scale factor :1023
IEEE Representation
IEEE Floating Point notation is the standard representation in use. There are two
representations:
- Single precision.
- Double precision.

Fig. IEEE Single Precision

Fig. IEEE Double Precision


Biased Exponent
Using 8 bits we can have exponent value in the range - 27 to +(27-1)

Biased exponent says that we donot want our exponent as a signed number, we want it
only as a positive number.

So to get only +ve values, we are using biased exponent, where we will add bias to the
actual exponent and the result will be stored as the exponent of the number.

Say the exponent is -5, it will be stored as -5+ bias


Bias is represented as 2k-1 , if exponent is represented using k no of bits.[In general]
Biased Exponent
In the IEEE Single precision, using 8 bits exponent can be from 0 to 255(using 8bits)

But 0 and 255 is used for representing special numbers.

So, we are going to store exponent in the range 1 to 254.

If if exponent is represented using k(8) no of bits, we are taking bias as 2k-1-1 ,


(27-1=127)

Original Exponent Biased (Excess-127) Exponent


-126 1
-125 2
....
.......
127 254
Implicit Normalized Number
Say a number is 1001.01

Then to perform implicit normalization we need to bring the decimal point to the
left of 1st one in the number.
1001.01= 1.00101 x 23
Steps to convert a given decimal number into IEEE Format
Step1: Convert the number into Binary.

Step 2: Normalize (Implicit) the number.


if decimal point is taken to the left by i positions, then multiply the number by 2i
if decimal point is taken to the right by i positions, then multiply the number by
2-i

Step 3: If the number is +ve S=0, Else S=1

Step 4: Add 127 to the exponent , then write the binary of it for E’

Step 5: Mantissa is whatever is to the right of the decimal point in the


normalized number. If it is not of 23 bits, then append zeros to the right of the
actual mantissa to make it of length 23 bits.
Example : Convert -13.25 into IEEE Single Precision Format
Step1: Convert the number into Binary.

13.25 = 1101.01

Step 2: Normalize (Implicit) the number.

1101.01 = 1.10101 x 23

Step 3: If the number is +ve S=0, Else S=1


Here, S=1
Step 4: Add 127 to the exponent , then write the binary of it for E’

E’=127+3=130= 10000010
Step 5: Mantissa is whatever is to the right of the decimal point in the normalized
number. If it is not of 23 bits, then append zeros to the right of the actual mantissa to
make it of length 23 bits. Solution: S=1
E’=1000 0010
M= 1010 1000 0000 0000 0000 000
M= 1010 1000 0000 0000 0000 000
Example : Convert -13.25 into IEEE Double Precision Format
Step1: Convert the number into Binary.

13.25 = 1101.01

Step 2: Normalize (Implicit) the number.

1101.01 = 1.10101 x 23

Step 3: If the number is +ve S=0, Else S=1


Here, S=1
Step 4: Add 1023 to the exponent , then write the binary of it for E’

E’=1023+3=1026= 100 0000 0010


Step 5: Mantissa is whatever is to the right of the decimal point in the normalized
number. If it is not of 52 bits, then append zeros to the right of the actual mantissa to
make it of length 23 bits. Solution: S=1
E’=100 0000 0010
M= 1010 1000 0000 0000 0000 000.......0
M= 1010 1000 0000 0000 0000 000.......0
Example : Convert -17 into IEEE Format
Step1: Convert the number into Binary.

17 = 10001

Step 2: Normalize (Implicit) the number.

10001.0 = 1.0001 x 24

Step 3: If the number is +ve S=0, Else S=1


Here, S=1
Step 4: Add 127 to the exponent , then write the binary of it for E’

E’=127+4=131= 10000011
Step 5: Mantissa is whatever is to the right of the decimal point in the normalized
number. If it is not of 23 bits, then append zeros to the right of the actual mantissa to
make it of length 23 bits. Solution: S=1
E’=1000 0011
M= 0001 0000 0000 0000 0000 000
M= 0001 0000 0000 0000 0000 000
Example : Convert -0.35 into IEEE Format
Step1: Convert the number into Binary.

0.35 = 0.01011

Step 2: Normalize (Implicit) the number.

0.01011 = 1.011 x 2-2

Step 3: If the number is +ve S=0, Else S=1


Here, S=1
Step 4: Add 127 to the exponent , then write the binary of it for E’

E’=127+-2=125= 0111 1101


Step 5: Mantissa is whatever is to the right of the decimal point in the normalized
number. If it is not of 23 bits, then append zeros to the right of the actual mantissa to
make it of length 23 bits. Solution: S=1
E’= 0111 1101
M= 0110 0000 0000 0000 0000 000
M= 0110 0000 0000 0000 0000 000
Value Represented
Value represented =(-1)s x1.M x 2E’-127
What is the value represented by the following 32 bits in IEEE -754
REPRESENTATION?

S=0
E’= 10000011
M=11000....................00

Value represented =(-1)0X1.11 X 2131-127 [10000011=131]


=+1.11 X 24
=(11100)2
=+(28)10
Value Represented
Value represented =(-1)sX1.M X 2E’-127
What is the value represented by the following 32 bits in IEEE -754
REPRESENTATION?

S=0
E’= 10000000
M=11000....................00

Value represented =(-1)0X1.11 X 2128-127 [10000000=128]


=+1.11 X 21
=(11.1)2
=+(3.5)10
Value Represented
Value represented =(-1)sX1.M X 2E’-127
What can be the maximum value represented by the 32 bits in IEEE -754
REPRESENTATION?

S=0
E’= 11111110
M=11111..................11

Value represented =(-1)0 x 01.11..................1 x 2254-127


=1111................1 x 2-23 x 2127
=(224-1) x 2104
=2128
Value Represented
Value represented = (-1)sX1.M X 2E’-127
What can be the minimum value represented by the 32 bits in IEEE -754
REPRESENTATION?

S=1
E’= 11111110
M=11111..................11

Value represented =(-1)1 x 01.11..................1 x 2254-127


=-1111................1 x 2-23 x 2127
=-(224-1) x 2104
=-2128
Value Represented
Value represented =(-1)sX1.M X 2E’-127
What can be the minimum positive value represented by the 32 bits in IEEE
-754 REPRESENTATION?

S=0
E’= 00000001
M=0000.....0000

Value represented =(-1)0 x 1.00..................0 x 21-127


=1 x 2-126
=2-126
Value Represented
Value represented =(-1)sX1.M X 2E’-127
What can be the maximum negative value represented by the 32 bits in IEEE
-754 REPRESENTATION?

S=1
E’= 00000001
M=0000.....0000

Value represented = (-1)1 x 1.00..................0 x 21-127


= -1 x 2-126
= -2-126
Denormalized Number

Denormalized Number: is a very very small number which cannot be


normalized( implicit) [For single precision ]

Let say a number be, 0.0000000..............................11 [After decimal 131 places


are there

After Implicit normalization =>

1.1 x 2-130

M=1
E’=-130+127=-3

In conventional metod this number cannot be stored as the exponent is -ve.


But , in IEEE Standard, this will be treated as a special number.
Denormalized Number

IEEE- 754 standard says that, for a small number normalize it till -126 bits.

Let say a number be, 0.0000000..............................011 [After decimal 130


places are there

After Implicit normalization =>

0.0011 x 2-126

M=0011
E’=00000000
This is an example of denormalized number

Value formula for denormalized number : (-1)s x 0.M x 2-126(bias-1)


SPECIAL VALUES
S E’ M Number
0 00000000 000.................. +0
0
1 00000000 000.................. -0
0
0 11111111 000.................. +∞
0
1 11111111 000.................. -∞
0
0/1 11111111 M≠0 NaN
0/1 00000000 M≠0 Denormalized Number
0/1 E’≠00000000 M=XXX......X Implicit Normalized
E’≠11111111 Number
Floating point arithmetic: ADD/SUB rule
Choose the number with the smaller exponent.
Shift its mantissa right until the exponents of both the numbers are equal.
Add or subtract the mantissas.
Determine the sign of the result.
Normalize the result if necessary and truncate/round to the number of mantissa
bits.
Floating point arithmetic: MUL rule

Add the exponents.


Subtract the bias.
Multiply the mantissas and determine the sign of the result.
Normalize the result (if necessary).
Truncate/round the mantissa of the result.
Floating point arithmetic: DIV rule

Subtract the exponents


Add the bias.
Divide the mantissas and determine the sign of the result.
Normalize the result if necessary.
Truncate/round the mantissa of the result.
Floating Point Arithmetic: ADD/SUB rule

Number 1: Number 2:
S=0 S=0
E’= 1000 0100 E’= 1000 0011
M=000111100.....0 M=0100100...00000

Number 1: Number 2:
1.0001111 x 25 1.01001 x 24

Choose the number with the smaller exponent. and shift its mantissa right until the
exponents of both the numbers are equal. So number 2 will become
0.1010010 x 25 and now perform the addition

1.0 0 0 1 1 1 1 x 25
0.1 0 1 0 0 1 0 x 25

1.1 1 0 0 0 0 1 x 25
Value Represented
Value represented =(-1)sX0.M X 2-126
What is the value represented by the following 32 bits in IEEE -754
REPRESENTATION?

S=0
E’= 00000000
M=11000....................00≠0=>Denormalized number

Value represented =(-1)0X0.11 X 2-126


=11.0 x 2-2 x 2-126
=3 x 2-128
GATE QUESTIONS

Consider three registers R1, R2, and R3 that store numbers in IEEE−754 single
precision floating point format. Assume that R1 and R2 contain the values (in
hexadecimal notation) 0x42200000 and 0xC1200000, respectively.

If R3 = R1 / R2, what is the value stored in R3 ?


(A) 0x40800000
(B) 0xC0800000
(C) 0x83400000
(D) 0xC8500000
R1= 0x42200000
R1= 0100 0010 0010 0000 .......................0000

R1, S=0
E’=10000100 = 132-127=5
M= 1.010000000......0

R1, 1.0100 x 25 = 101000=40

R2= 0xC1200000
R2= 1100 0001 0010 0000 .......................0000

R2, S=1
E’=1000 0010 = 130-127=3
M= 1.010000000......0
R2, 1.0100 x 23 = 1010=-10
So, R1/R2= 40/-10= -4
-4= 100.0 X 20 = 1.00 x 22, S=1, E’=1000 0001, M=0000000.......0
1100 0000 100.......... .....0= C0800000
THANK YOU
Arithmetic
Chapter 4
Addition/subtraction of signed
numbers
xi yi Carry-i ci Su si Carry-o ci +
n m ut 1
At the ith stage:
0 0 0 0 0 Input:
0 0 1 1 0 ci is the carry-in
0 1 0 1 0
0 1 1 0 1
Output:
1 0 0 1 0 si is the sum
1 0 1 0 1 ci+1 carry-out to
1 1 0 0 1
1 1 1 1 1 (i+1)st
state
si xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i Å yi Å ci
ci == yi ci + xi ci + xi yi
+1

Exampl
e:

X 7 0 1 1 1 Carry-o xi Carry-i
+Y = + = + 00 1 1 1 1 0 0 0 yi
ut ci+ n ci
Z 61 1 1 0 1 1 si
3
Legend for i
stage
Addition logic for a single stage
Sum Carry
yi
c
i
xi
xi
yi si c
c i +1
i
ci
x
xi yi i
yi

ci + 1 Full ci
( A)
adder
F

s
i

Full Adder (FA): Symbol for the complete circuit


for a single stage of addition.
n-bit adder
•Cascade n full adder (FA) blocks to form a n-bit adder.
•Carries propagate or ripple through this cascade, n-bit ripple carry adder.

xn - 1 yn - 1 x1 y1 x0 y0

cn - 1 c1
cn FA FA FA c0

sn - 1 s1 s0
Most significant Least significant
bit(MSB) bit(LSB)
position position

Carry-in c0 into the LSB position provides a convenient way to


perform subtraction.
K n-bit adder
K n-bit numbers can be added by cascading k n-bit adders.

xkn - 1 ykn - 1 x2n - 1 y2n - 1 xn y n xn - 1 y n - 1 x 0 y 0

cn
n-bi n-bi n-bi c
c kn 0
adde
t adde
t adde
t
r r r

s s( s s s s
kn - 1 k - 1) n 2n - 1 n n- 1 0

Each n-bit adder forms a block, so this is cascading of blocks.


Carries ripple or propagate through blocks, Blocked Ripple Carry Adder
n-bit subtractor
•Recall X – Y is equivalent to adding 2’s complement of Y to X.
•2’s complement is equivalent to 1’s complement + 1.
•1’s complement of a number can be obtianed by XORing the number with 1.

•2’s complement of positive and negative numbers is computed similarly.

xn - 1 yn - 1 x1 y1 x0 y0

cn - 1 c1
cn FA FA FA 1

sn - 1 s1 s0
Most significant Least significant
bit(MSB) bit(LSB)
position position
n-bit adder/subtractor (contd..)
y y y
n- 1 1 0
Add/Su
bcontro
l

x x x
n- 1 1 0

c n-bit
n adder c
0

s s s
n- 1 1 0

•Add/sub control = 0, addition.


•Add/sub control = 1, subtraction.
Detecting overflows
⚫ Overflows can only occur when the sign of the two operands is
the same.
⚫ Overflow occurs if the sign of the result is different from the
sign of the operands.
⚫ Recall that the MSB represents the sign.
⚫ xn-1, yn-1, sn-1 represent the sign of operand x, operand y and result s
respectively.
⚫ Circuit to detect overflow can be implemented by the following
logic expressions: -4+-6 AND 2+4 example
Computing the add time
x0 y0
Consider 0th
•cstage:
1
is available after 2 gate
delays.
c1 FA c0 •s1 is available after 1 gate delay.

s0

Sum Carr
yi
y
c
i
xi
xi
yi si c
c i +1
i
ci
x
i
yi
Computing the add time (contd..)
Cascade of 4 Full Adders, or a 4-bit adder

x0 y0 x0 y0 x0 y0 x0 y0

FA FA FA FA c0
c4 c3 c2 c1

s3 s2 s1 s0

•s0 available after 1 gate delays, c1 available after 2 gate


delays.
•s1 available after 3 gate delays, c2 available after 4 gate
delays.
•s2 available after 5 gate delays, c3 available after 6 gate
For an n-bit adder, sn-1 is available after 2n-1 gate
delays.
delays
•s3 available after 7 gate delays, c4 available after 8 gate
delays. cn is available after 2n gate delays.
Fast addition
Recall the equations:

Second equation can be written as:

We can write:

•Gi is called generate function and Pi is called propagate function


•Gi and Pi are computed only from xi and yi and not ci, thus they can
be computed in one gate delay after X and Y are applied to the
inputs of an n-bit adder.
Carry lookahead

•All carries can be obtained 3 gate delays after X, Y and c0 are applied.
-One gate delay for Pi and Gi
-Two gate delays in the AND-OR circuit for ci+1
•All sums can be obtained 1 gate delay after the carries are computed.
•Independent of n, n-bit addition requires only 4 gate delays.
•This is called Carry Lookahead adder.
Carry-lookahead adder
x y x y x y x y
3 3 2 2 1 1 0 0

c4
c
3
c
2
c
1
. c
4-bit
carry-lookahead
B cell B cell B cell B cell 0

adder
s s s s
3 2 1 0

G3 P3 G2 P2 G P G P
1 1 0 0

Carry-lookahead logic
xi yi

. .
. c
i

B-cell for a single stage


B cell

Gi P i
si
Carry lookahead adder (contd..)
⚫ Performing n-bit addition in 4 gate delays independent of n
is good only theoretically because of fan-in constraints.

ci+1 = Gi + PiGi −1 + Pi Pi−1 Gi −2 + .. + Pi Pi −1 ..P1G0 + Pi Pi −1 ...P0 c


⚫ Last AND gate and OR gate require a fan-in of (n+1) for0 a
n-bit adder.
⚫ For a 4-bit adder (n=4) fan-in of 5 is required.
⚫ Practical limit for most gates.
⚫ In order to add operands longer than 4 bits, we can cascade
4-bit Carry-Lookahead adders. Cascade of Carry-Lookahead
adders is called Blocked Carry-Lookahead adder.
4-bit carry-lookahead Adder
Blocked Carry-Lookahead adder
Carry-out from a 4-bit block can be given as:

Rewrite this as:

Subscript I denotes the blocked carry lookahead and identifies the


block.
Cascade 4 4-bit adders, c16 can be expressed as:
Blocked Carry-Lookahead adder
x15-1 y15-1 x11- y11- x7- y7- x3- y3-
2 2 8 8 4 4 0 0

c1
6
4-bit
adder
c1
2 4-bit
adder
c8
4-bit
adder
c4
4-bit
adder
. c0

s15-1 s11- s7- s3-


2 8 4 0
G3 I P3 I G2 I P2I G1I P1I G0I P0I

Carry-lookahead
logic

After xi, yi and c0 are applied as inputs:


- Gi and Pi for each stage are available after 1 gate delay.
- PI is available after 2 and GI after 3 gate delays.
- All carries are available after 5 gate delays.
- c16 is available after 5 gate delays.
- s15 which depends on c12 is available after 8 (5+3)gate delays
(Recall that for a 4-bit carry lookahead adder, the last sum bit is
available 3 gate delays after all inputs are available)
Multiplication
Multiplication of unsigned numbers

Product of 2 n-bit numbers is at most a 2n-bit number.


Unsigned multiplication can be viewed as addition of shifted
versions of the multiplicand.
Multiplication of 1101 X 1011
In a two-level cache system, the access times of L1 and L2 caches are 1
and 8 clock cycles respectively. The miss penalty from L2 cache to main
memory is 18 clock cycles. The miss rate of L1 cache is twice that of L2.
The average memory access time of the cache system is 2 cycles. The miss
rates of L1 and L2 caches respectively are:
a. 0.130 and 0.065
b. 0.056 and 0.111
c. 0.0892 and 0.1784
d. 0.1784 and 0.0892

Correct answer is (a).


Let the miss rate of L2 cache be x.
So, miss rate of L1 cache = 2x.
Thus, average memory access time
AMAT = (1-2x).1 + 2x. [(1-x).8 + x.18] = 2 (given)
Problems
Multiplication of unsigned numbers
(contd..)
⚫ We added the partial products at end.
⚫ Alternative would be to add the partial products at each stage.

⚫ Rules to implement multiplication are:


⚫ If the ith bit of the multiplier is 1, shift the multiplicand and add the
shifted multiplicand to the current value of the partial product.
⚫ Hand over the partial product to the next stage
⚫ Value of the partial product at the start stage is 0.
Multiplication of unsigned numbers
Typical multiplication
cell

Bit of incoming partial product (PPi)


jth multiplicand bit

ith multiplier bit ith multiplier bit

carry out FA carry in

Bit of outgoing partial product


(PP(i+1))
Combinatorial array multiplier
Combinatorial array multiplier

Multiplican
d
0 m3 0 m2 0 m1 0 m0
(PP0
) q0
0
PP p0
1 q1
0

r
lie
PP

tip
p1

ul
2 q2

M
0
PP p2
3 q3
0
,
p7 p6 p5 p4 p3

Product is: p7,p6,..p0

Multiplicand is shifted by displacing it through an array of adders.


Combinatorial array multiplier
(contd..)
⚫ Combinatorial array multipliers are:
⚫ Extremely inefficient.
⚫ Have a high gate count for multiplying numbers of practical size such as 32-bit
or 64-bit numbers.
⚫ Perform only one function, namely, unsigned integer product.

Improve gate efficiency by using a mixture of combinatorial array techniques and


sequential techniques requiring less combinational logic.
Sequential multiplication
⚫ Recall the rule for generating partial products:
⚫ If the ith bit of the multiplier is 1, add the appropriately shifted multiplicand to
the current partial product.
⚫ Multiplicand has been shifted left when added to the partial product.

However, adding a left-shifted multiplicand to an unshifted partial product is


equivalent to adding an unshifted multiplicand to a right-shifted partial product.

Say PP is 1101 and the Multiplicand is 0101

Category 1: Category 2:
Left Shift M : 11010 Right Shift PP : 00101
PP : 0101 M : 1101
ADD them ADD them
1 1010 0010 1
0101 1101
1 1111 1111 1
Sequential Circuit Multiplier
Register A (initially
0)
Shift right

C a a q q
n - 1 0 n - 1 0

Multiplier Q
Add/Noadd
control

n-bit
Adder
MUX Control
sequencer

0 0

m m
n - 1 0

Multiplicand M
Control Logic and Registers
n bit registers, 1 bit carry register C

Register set up
Q register ← multiplier
M register ← multiplicand
A register ← 0
C←0

C for carries after addition


Product will be 2n bits in A Q registers.
Algorithm for unsigned binary multiplication (Sequential
Multiplier Algorithm)
Step-1: Initialize C=A=0, Q=multiplier value,
M=multiplicand value

Step-2: Do the following n times


i) If q0 = 1, Add M into A (Perform A = A + M), store carry in C,
If q0 = 0, No-Add (Donot do anything)
ii) Shift C, A, Q right one bit so that
an-1 ← C (The carry flag bit will be shifted to MSB of A)
qn-1 ← a0 (The LSB of A will be shifted to MSB of Q)
q0 is lost
Step-3: Product is in A, Q
Flow Chart for unsigned binary multiplication
Sequential multiplication (contd..)
M
1 1 0 1
Initial configuration
0 0 0 0 0 1 0 1 1
C A Q
0 1 1 0 1 1 0 1 1
Shift First cycle
0 0 1 1 0 1 1 0 1

1 0 0 1 1 1 1 0 1 Add
Shift Second cycle
0 1 0 0 1 1 1 1 0

0 1 0 0 1 1 1 1 0 No add
Shift Third cycle
0 0 1 0 0 1 1 1 1

1 0 0 0 1 1 1 1 1 Add
Shift Fourth cycle
0 1 0 0 0 1 1 1 1

Product
1101 x 1011
Signed Multiplication
11101 x 01011 = -13 x 11
Signed Multiplication: Rule 1
⚫ +VE Multiplier n -ve Multiplicand, extend the sign bit value of the
multiplicand to the left as far as the product will extend.

1 0 0 1 1 (- 1 )
0 1 0 1 1 ( + 13 )
1
1 1 1 1 1 1 0 0 1 1

1 1 1 1 1 0 0 1 1
Sign extension
isshown in 0 0 0 0 0 0 0 0
blue
1 1 1 0 0 1 1

0 0 0 0 0 0

1 1 0 1 1 1 0 0 0 1 ( - 14 )
3

Sign extension of negative multiplicand.


01101 x 10101= 13 x -11
Take the 2’s complement of 01101: 10011
Take the 2’s complement of 10101: 01011

10011
01011
1111110 011
11111001 1
00000000
0010011
000000
10 0 0 0 0 1 1 1 0 0 0 1
Signed Multiplication : Rule 2

⚫ For a negative multiplier, a straightforward solution is to


form the 2’s-complement of both the multiplier and the
multiplicand and proceed as in the case of a positive
multiplier.
⚫ This is possible because complementation of both
operands does not change the value or the sign of the
product.
⚫ A technique that works equally well for both negative and
positive multipliers – Booth algorithm.
01101 x 10101 = 13 x -11
Flow Chart for unsigned binary multiplication
Booth Algorithm [Uniform
Rule]
⚫ Consider in a multiplication, the multiplier is positive
0011110, how many appropriately shifted versions of
the multiplicand are added in a standard procedure?

0 1 0 1 1 0 1
0 0 +1 +1 + 1+1 0
0 0 0 0 0 0 0
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 1 0 1 1 0 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 0 0 0 1 1 0
Booth Algorithm
M X 0011110[30]=M X ( 25 - 21) [30=32-2=25 -21]

We know that, multiplying something by 2i is equivalent to shifting the number


left by i times.
We know that, multiplying something by -2i is equivalent to shifting the 2’s complement
of the number left by i times.

0 1 0 1 1 0 1
0 +1 0 0 0 -1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
2's complement of
1 1 1 1 1 1 1 0 1 0 0 1 1
the multiplicand
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 1 0 1
0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 0 0 0 1 1 0
-13 x 11
+13= 01101 +11= 01011
-13= 10011
+11= +1-1+10-1
43 2 1 0
+2 -2 +2 -20 =16-8+4-1=11
4 3 2

1 0 01 1
+1 -1 +1 0 -1
00000 0 11 0 1
0000 0 00 0 0
111 100 1 1
000 11 01
110 01 1
1101 1 1 00 0 1

Ans is = -(2’comp of ( 1 0 1 1 1 0 0 0 1)
= - 10001111 = -143
Booth Algorithm
⚫ In general, in the Booth scheme, -1 times the shifted multiplicand is selected
when moving from 0 to 1, and +1 times the shifted multiplicand is selected
when moving from 1 to 0, as the multiplier is scanned from right to left.

0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0

0 +1 -1 +1 0 - 1 0 +1 0 0 - 1 +1 - 1 + 1 0 - 1 0 0

Booth recoding of a
multiplier.
Booth Algorithm

0 1 1 0 1 ( + 13) 0 1 1 0 1
X1 1 0 1 0 (- 6) 0 - 1 +1 - 1 0
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1 1
0 0 0 0 1 1 0 1
1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0 ( - 78)

Booth multiplication with a negative multiplier.


13 x -6 = 01101 x 11010 [-13= 10011
Booth Algorithm
Multiplier
Version of multiplicand
selected by biti
Bi i
Bit i -1
t
0 0 0 XM
0 1 +1 XM
1 0 −1 XM
1 1 0 XM

Booth multiplier recoding


table.
Booth Algorithm

⚫ Best case – a long string of 1’s (skipping over 1s)


⚫ Worst case – 0’s and 1’s are alternating
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Worst-cas
emultiplie
r +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1

1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0
Ordinar
multiplie
y
r 0 -1 0 0 +1 - 1 +1 0 - 1 +1 0 0 0 -1 0 0

0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1
Goo
multiplie
d
r 0 0 0 +1 0 0 0 0 -1 0 0 0 +1 0 0 -1
FlowChart for Booth’s Algorithm
Booth’s Algorithm(-7 x 3) +7=0111, -7=1001
Register Multiplier Multiplicand Operation Remark
Register Register
A Q Q-1 M
0000 0011 0 1001 Initially

0111 0011 0 A=A-M=A+ 2s(M) 1st Cycle


0011 1001 1 Shift A, Q, Q-1 right

0011 1001 1 No Add/Sub 2nd Cycle


0001 1100 1 Shift A, Q, Q-1 right

1010 1100 1 A=A+M 3rd Cycle


1101 0110 0 Shift A, Q, Q-1 right

1101 0110 0 No Add/Sub 4th Cycle


1110 1011 0 Shift A, Q, Q-1 right
Fast Multiplication
Bit-Pair Recoding of Multipliers
⚫ Bit-pair recoding halves the maximum number of summands (versions of
the multiplicand).

Sign extension Implied 0 to right of


1 1 1 0 1 0 0 LSB

0 0 −1 +1 −1 0

0 −1 −2

(a) Example of bit-pair recoding derived from Booth recoding


Bit-Pair Recoding of Multipliers

Multiplier bit-pair Multiplier bit on the Multiplicand


right selected at position i
i +1 i i −1

0 0 0 0 X M
0 0 1 +1 X M
0 1 0 +1 X M
0 1 1 +2 X M
1 0 0 −2 X M
1 0 1 −1 X M
1 1 0 −1 X M
1 1 1 0 X M

(b) Table of multiplicand selection decisions


Bit-Pair Recoding of Multipliers
0 1 1 0 1
0 - 1 +1 - 1 0
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1 1
0 0 0 0 1 1 0 1
1 1 1 0 0 1 1
0 1 1 0 1 ( + 13) 0 0 0 0 0 0
× 1 1 0 1 0 (- 6 ) 1 1 1 0 1 1 0 0 1 0 ( - 78)

X (-2)= Take the 2’s comp n then shift


left by 1 position 0 1 1 0 1
0 -1 -2
X (-1)= Take the 2’s comp of the 1 1 1 1 1 0 0 1 1 0
operand 1 1 1 1 0 0 1 1
0 0 0 0 0 0
X (2)= Shift the operand by 1 bit
position to the left 1 1 1 0 1 1 0 0 1 0

Multiplication requiring only n/2 summands. 56


Carry-Save Addition of Summands
⚫ CSA speeds up the addition process.

P7 P6 P5 P4 P3 P2 P1 57P0
Carry-Save Addition of Summands(Cont.,)

P7 P6 P5 P4 P3 P2 P1 P0
Carry-Save Addition of
Summands(Cont.,)
⚫ Consider the addition of many summands, we can:
Group the summands in threes and perform carry-save addition on
each of these groups in parallel to generate a set of S and C vectors in
one full-adder delay
Group all of the S and C vectors into threes, and perform carry-save
addition on them, generating a further set of S and C vectors in one
more full-adder delay
Continue with this process until there are only two vectors remaining
They can be added in a RCA or CLA to produce the desired product
Carry-Save Addition of Summands
1 0 1 1 0 1 (45 M
)
X 1 1 1 1 1 1 (63 Q
)
1 0 1 1 0 1 A
1 0 1 1 0 1 B

1 0 1 1 0 1 C
1 0 1 1 0 1 D
1 0 1 1 0 1 E
1 0 1 1 0 1 F

1 0 1 1 0 0 0 1 0 0 1 1 (2,835 Produc
) t

Figure 6.17. A multiplication example used to illustrate carry-save addition as shown in Figure
6.18.
1 0 1 1 0 1 M

x 1 1 1 1 1 1 Q

1 0 1 1 0 1 A

1 0 1 1 0 1 B

1 0 1 1 0 1 C

1 1 0 0 0 0 1 1 S
1
0 0 1 1 1 1 0 0 C
1

1 0 1 1 0 1 D
1 0 1 1 0 1 E
1 0 1 1 0 1 F

1 1 0 0 0 0 1 1 S
2
0 0 1 1 1 1 0 0 C
2

1 1 0 0 0 0 1 1 S1

0 0 1 1 1 1 0 0 C
1
1 1 0 0 0 0 1 1 S2
1 1 0 1 0 1 0 0 0 1 1 S
3
0 0 0 0 1 0 1 1 0 0 0 C3
0 0 1 1 1 1 0 0 C2
0 1 0 1 1 1 0 1 0 0 1 1 S4
+ 0 1 0 1 0 1 0 0 0 0 0 C
4
1 0 1 1 0 0 0 1 0 0 1 1 Produc
t

Figure 6.18. The multiplication example from Figure 6.17 performed using
carry-save addition.
Integer Division
Manual Division
2 1010
13 1
27 1
1101 10001001
4
2 0110
61 11000
4
1 0110
31 1 111
0
110
1 1

Longhand division examples.


Longhand Division Steps
⚫ Position the divisor appropriately with respect to the
dividend and performs a subtraction.
⚫ If the remainder is zero or positive, a quotient bit of 1
is determined, the remainder is extended by another
bit of the dividend, the divisor is repositioned, and
another subtraction is performed.
⚫ If the remainder is negative, a quotient bit of 0 is
determined, the dividend is restored by adding back
the divisor, and the divisor is repositioned for another
subtraction.
Circuit Arrangement
Shift left

an an-1 a0 qn-1 q0
Dividend Q
A Quotient
Setting

N+1 bit Add/Subtract


adder
Control
Sequencer

0 mn-1 m0

Divisor M

Figure 6.21. Circuit arrangement for binary division.


Restoring Division
⚫ Shift A and Q left one binary position
⚫ Subtract M from A, and place the answer back in A
⚫ If the sign of A is 1, set q0 to 0 and add M back to A
(restore A); otherwise, set q0 to 1
⚫ Repeat these steps n times
Examples Initiall
y
Shif
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1 0 0 0

0 0 0
tSubtrac 1 1 1 0 1 First cycle
t q0
Se 1 1 1 1 0
tRestor 1 1
e 0 0 0 0 1 0 0 0 0
10 Shif 0 0 0 1 0 0 0 0
11 1000 tSubtrac 1 1 1 0 1
11 tSe q 1 1 1 1 1 Second cycle
0
Restor
t 1 1
10 e 0 0 0 1 0 0 0 0 0
Shif 0 0 1 0 0 0 0 0
Subtrac
t 1 1 1 0 1
tSe q0 0 0 0 0 1 Third cycle
t
Shif 0 0 0 1 0 0 0 0 1
t
Subtrac 1 1 1 0 1 0 0 1
t q0
Se 1 1 1 1 1 Fourth cycle
tRestor 1 1
e 0 0 0 1 0 0 0 1 0

Remainder Quotient

Figure 6.22. A restoring-division


Nonrestoring Division
⚫ Avoid the need for restoring A after an
unsuccessful subtraction.
⚫ Any idea?
⚫ Step 1: (Repeat n times)
If the sign of A is 0, shift A and Q left one bit position and
subtract M from A; otherwise, shift A and Q left and add M
to A.
Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
⚫ Step2: If the sign of A is 1, add M to A
Examples Initially

Shift
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1 0 0 0

0 0 0 First cycle
Subtract 1 1 1 0 1
Set q0 1 1 1 1 0 0 0 0 0

Shift 1 1 1 0 0 0 0 0
Add 0 0 0 1 1 Second cycle

Set q 1 1 1 1 1 0 0 0 0
0

Shift 1 1 1 1 0 0 0 0
1 1 1 1 1 Add 0 0 0 1 1 Third cycle
Restore
0 0 0 1 1 Set q 0 0 0 0 1 0 0 0 1
remainder 0
Add 0 0 0 1 0
Remainder Shift 0 0 0 1 0 0 0 1
Subtract 1 1 1 0 1 Fourth cycle
Set q 1 1 1 1 1 0 0 1 0
0

Quotient
A nonrestoring-division example.
Floating-Point Numbers
and
Operations
Why floating point numbers?
The maximum value that can be represented using 32 bits is 4,294,967,295, is an
integer equal to 232 − 1.

For scientific calculations, sometimes, we need very big number like


6.0247x 1023 or a very small nubmer like 9.1 x 10-31

But for representing such numbers 32 bits are not sufficient enough, we need larger
number of bits in our conventional methods.

Solution:
IEEE 754 Standard: Single Precision(32 bits)
Double Precision(64 bits)
Introduction to basic terms

For a number 6.0247 x 1023


It consists of 3 components:

a. Mantissa: [Significant Digits] 6.0247


b. base: 10
c. Exponent : 23
d. Scale factor :1023
IEEE notation
IEEE Floating Point notation is the standard representation in use. There are two
representations:
- Single precision.
- Double precision.
Both have an implied base of 2.
Single precision:
- 32 bits (23-bit mantissa, 8-bit exponent in excess-127 representation)
Double precision:
- 64 bits (52-bit mantissa, 11-bit exponent in excess-1023 representation)
Fractional mantissa, with an implied binary point at immediate left.

Sign Exponent Mantissa


1 8 or 11 23 or 52
Biased Exponent:
Using 8 bits we can have exponent value in the range - 27 to +(27-1)

Biased exponent says that we donot want our exponent as a signed number, we want it
only as a positive number.
So to get only +ve values, we are using biased exponent, where we will add bias to the
actual exponent and the result will be stored as the exponent of the number.

say exponent is -5 it will be stored as -5+ bias


Bias is represented as 2k-1 , if exponent is represented using k no of bits.[In general]

Sign Exponent Mantissa


1 8 or 11 23 or 52
Biased Exponent:
In the IEEE Single precision, using 8 bits exponent can be from 0 to 255(using 8bits)

But 0 and 255 is used for representing special numbers.

So, we are going to store exponent in the range 1 to 254.

So we are going to take bias as 2k-1-1 , (27-1=127)if exponent is represented using k(8)
no of bits.

Sign Exponent Mantissa


1 8 or 11 23 or 52
Biased Exponent:
Original Exponent Biased (Excess-127) Exponent
-126 1
-125 2
....
.......
127 254

Sign Exponent Mantissa


1 8 or 11 23 or 52
Implicit Normalized Number

say a number is 1001.01

Then to perform implicit normalization we need to bring the decimal point to the left
of 1st one in the number.

1001.01= 1.00101 x 23
Steps to convert a given decimal number into IEEE Format
Step1: Convert the number into Binary.

Step 2: Normalize (Implicit) the number.


if decimal point is taken to the left by i positions, then multiply the number by 2i
if decimal point is taken to the right by i positions, then multiply the number by
2-i

Step 3: If the number is +ve S=0, Else S=1

Step 4: Add 127 to the exponent , then write the binary of it for E’

Step 5: Mantissa is whatever is to the right of the decimal point in the


normalized number. If it is not of 23 bits, then append zeros to the right of the
actual mantissa to make it of length 23 bits.
Value Represented
Value represented =(-1)sX1.M X 2E’-127
What is the value represented by the following 32 bits in IEEE -754
REPRESENTATION?

S=0
E’= 10000011
M=11000....................00

Value represented =(-1)0X1.11 X 2131-127 [10000011=131]


=+1.11 X 24
=(11100)2
=+(28)10
Value Represented
Value represented =(-1)sX1.M X 2E’-127
What is the value represented by the following 32 bits in IEEE -754
REPRESENTATION?

S=0
E’= 10000000
M=11000....................00

Value represented =(-1)0X1.11 X 2128-127 [10000000=128]


=+1.11 X 21
=(11.1)2
=+(3.5)10
Value Represented
Value represented =(-1)sX0.M X 2-126
What is the value represented by the following 32 bits in IEEE -754
REPRESENTATION?

S=0
E’= 00000000
M=11000....................00≠0=>Denormalized number

Value represented =(-1)0X0.11 X 2-126


=11.0 x 2-2 x 2-126
=3 x 2-128
Value Represented
Value represented =(-1)sX0.M X 2-126
What can be the maximum value represented by the 32 bits in IEEE -754
REPRESENTATION?

S=0
E’= 11111110
M=11111..................11

Value represented =(-1)0 x 01.11..................1 x 2254-127


=1111................1 x 2-23 x 2127
=(224-1) x 2104
=2128
Value Represented
Value represented =(-1)sX0.M X 2-126
What can be the minimum value represented by the 32 bits in IEEE -754
REPRESENTATION?

S=1
E’= 11111110
M=11111..................11

Value represented =(-1)1 x 01.11..................1 x 2254-127


=-1111................1 x 2-23 x 2127
=-(224-1) x 2104
=-2128
Value Represented
Value represented =(-1)sX0.M X 2-126
What can be the minimum positive value represented by the 32 bits in IEEE
-754 REPRESENTATION?

S=0
E’= 00000001
M=0000.....0000

Value represented =(-1)0 x 01.00..................0 x 21-127


=1 x 2-126
=2126
Value Represented
Value represented =(-1)sX0.M X 2-126
What can be the maximum negative value represented by the 32 bits in IEEE
-754 REPRESENTATION?

S=1
E’= 00000001
M=0000.....0000

Value represented =(-1)1 x 01.00..................0 x 21-127


=-1 x 2-126
=-2126
SPECIAL VALUES
S E’ M Number
0 00000000 000..................0 +0
1 00000000 000..................0 -0
0 11111111 000..................0 +∞
1 11111111 000..................0 -∞
0/1 11111111 M≠0 NaN
0/1 00000000 M≠0 Denormalized Number
0/1 E’≠00000000 M=XXX......X Implicit Normalized
E’≠11111111 Number
Denormalized Number
Denormalized Number: is a very very small number which cannot be
normalized( implicit) [For single precision ]

Let say a number be, 0.0000000..............................11 [After decimal 131 places


are there

After Implicit normalization =>

1.1 x 2-130

M=1
E’=-130+127=-3

In conventional metodtis number cannot be stored as the exponent is -ve.


But , in IEEE Standard, this will be treated as a special number.
Denormalized Number
IEEE- 754 standard says that, for a small number normalize it till -126 bits.

Let say a number be, 0.0000000..............................011 [After decimal 130


places are there

After Implicit normalization =>

0.0011 x 2-126

M=0011
E’=00000000
This is an example of denormalized number

Value formula for denormalized number : (-1)s x 0.M x 2-126(bias-1)


GATE QUESTION 2021
GATE QUESTION 2021
GATE QUESTION 2021
Peculiarities of IEEE notation
•Floating point numbers have to be represented in a normalized form to
maximize the use of available mantissa digits.
•In a base-2 representation, this implies that the MSB of the mantissa is
always equal to 1.
•If every number is normalized, then the MSB of the mantissa is always 1.
We can do away without storing the MSB.
•IEEE notation assumes that all numbers are normalized so that the MSB
of the mantissa is a 1 and does not store this bit.
•So the real MSB of a number in the IEEE notation is either a 0 or a 1.
•The values of the numbers represented in the IEEE single precision
notation are of the form:
(+,-) 1.M x 2(E - 127)
•The hidden 1 forms the integer part of the mantissa.
•Note that excess-127 and excess-1023 (not excess-128 or excess-1024) are used
to represent the exponent.
Exponent field
In the IEEE representation, the exponent is in excess-127 (excess-1023)
notation.
The actual exponents represented are:

-126 <= E <= 127 and -1022 <= E <= 1023


not
-127 <= E <= 128 and -1023 <= E <= 1024

This is because the IEEE uses the exponents -127 and 128 (and -1023 and
1024), that is the actual values 0 and 255 to represent special conditions:
- Exact zero
- Infinity
Floating point Arithmetic
Addition:
3.1415 x 108 + 1.19 x 106 = 3.1415 x 108 + 0.0119 x 108 = 3.1534 x 108
Multiplication:
3.1415 x 108 x 1.19 x 106 = (3.1415 x 1.19 ) x 10(8+6)

Division:
3.1415 x 108 / 1.19 x 106 = (3.1415 / 1.19 ) x
10(8-6)
Biased exponent problem:
If a true exponent e is represented in excess-p notation, that is as e+p.
Then consider what happens under multiplication:

a. 10(x + p) * b. 10(y + p) = (a.b). 10(x + p + y +p) = (a.b). 10(x +y + 2p)

Representing the result in excess-p notation implies that the exponent


should be x+y+p. Instead it is x+y+2p.
Biases should be handled in floating point arithmetic.
Floating point arithmetic: ADD/SUB rule

⚫ Choose the number with the smaller exponent.


⚫ Shift its mantissa right until the exponents of both the numbers are equal.
⚫ Add or subtract the mantissas.
⚫ Determine the sign of the result.
⚫ Normalize the result if necessary and truncate/round to the number of
mantissa bits.

Note: This does not consider the possibility of overflow/underflow.


Floating point arithmetic: ADD/SUB rule
Number 1: Number 2:
S=0 S=0
E’= 1000 0100 E’= 1000 0011
M=000111100.....0 M=0100100...00000

Number 1: Number 2:
1.0001111 x 25 1.01001 x 24

Choose the number with the smaller exponent. and shift its mantissa right until the
exponents of both the numbers are equal. So number 2 will become
0.1010010 x 25 and now perform the addition

1.0 0 0 1 1 1 1 x 25
0.1 0 1 0 0 1 0 x 25

1.1 1 0 0 0 0 1 x 25
Floating point arithmetic: MUL rule
⚫ Add the exponents.
⚫ Subtract the bias.
⚫ Multiply the mantissas and determine the sign of the
result.
⚫ Normalize the result (if necessary).
⚫ Truncate/round the mantissa of the result.
Floating point arithmetic: DIV rule

⚫ Subtract the exponents


⚫ Add the bias.
⚫ Divide the mantissas and determine the sign of the result.
⚫ Normalize the result if necessary.
⚫ Truncate/round the mantissa of the result.

Note: Multiplication and division does not require alignment of the


mantissas the way addition and subtraction does.
GATE QUESTIONS

Consider three registers R1, R2, and R3 that store numbers in IEEE−754 single
precision floating point format. Assume that R1 and R2 contain the values (in
hexadecimal notation) 0x42200000 and 0xC1200000, respectively.

If R3 = R1 / R2, what is the value stored in R3 ?


(A) 0x40800000
(B) 0xC0800000
(C) 0x83400000
(D) 0xC8500000
R1= 0x42200000
R1= 0100 0010 0010 0000 .......................0000

R1, S=0
E’=10000100 = 132-127=5
M= 1.010000000......0

R1, 1.0100 x 25 = 101000=40

R2= 0xC1200000
R2= 1100 0001 0010 0000 .......................0000

R2, S=1
E’=1000 0010 = 130-127=3
M= 1.010000000......0
R2, 1.0100 x 23 = 1010=-10
So, R1/R2= 40/-10= -4
-4= 100.0 X 20 = 1.00 x 22, S=1, E’=1000 0001, M=0000000.......0

1100 0000 100.......... .....0= C0800000


GATE QUESTIONS

A multiplexer is placed between a group of 32 registers and an accumulator to


regulate data movement such that at any given point in time the content of only
one register will move to the accumulator. The number of select lines needed
for the multiplexer is _________ .

(A) 5
(B) 6
(C) 4
(D) 7
GATE QUESTIONS

If there are m input lines n output lines for a decoder that is used to uniquely
address a byte addressable 1 KB RAM, then the minimum value of m+n is
________ .

(A) 18
(B) 1034
(C) 10
(D) 1024
GATE QUESTIONS
A computer system with a word length of 32 bits has a 16 MB
byte-addressable main memory and a 64 KB, 4-way set associative cahce
memory with a block size of 256 bytes. Consider the following four physical
addresses represented in hexadecimal notation.

A1 = 0x42C8A4, A2 = 0x546888, A3 = 0x6A289C, A4 = 0x5E4880

Which oe of the following is TRUE ?


(A) A1 and A4 are mapped to different cache sets.
(B) A2 and A3 are mapped to the same cache set.
(C) A3 and A4 are mapped to the same cache set.
(D) A1 and A3 are mapped to the same cache set.
GATE QUESTIONS
NO OF BLOCKS IN Cache= 64KB/256B= 28 = 256

NO OF SETS IN Cache= 256/4=64=26 I.E., 6 BITS FOR SET, 8 bits for word

A1=0x42C8A4: 0100 0010 1100 1000 1001 0100

A2 = 0x546888: 0101 0100 0110 1000 1000 1000

A3 = 0x6A289C: 0110 1010 0010 1000 1001 1100

A4 = 0x5E4880 : : 0101 1110 0100 1000 1000 0000


GATE QUESTIONS 2021
THANK YOU
I/O Organization

Bindu Agarwalla

1
Peripherals ??
Input Device

Output Device

Storage Device not directly accessible to the processor

CPU
Interface I/O
Why do we need a interface??

Peripherals are electromechanical or electromagnetic devices, and their manner of


operation is different from the operation of the CPUand memory, which are electronic
devices. So conversion of signal is required.

Data transfer rate of I/O is slower than CPU and memory.

Data codes and format in peripherals differ from the word format in the CPU and
memory. So conversion of formats is required.

The operating modes of peripherals are different from each other and each must be
controlled so a peripheral does not distrub the operation of other peripherals.
I/0 Interface

I/O device is connected to the bus using an I/O interface circuit which has:
- Address decoder, control circuit, and data and status registers.
I/0 Interface
Address decoder decodes the address placed on the address lines thus enabling the
device to recognize its address.

Data register holds the data being transferred to or from the processor.

Status register holds information necessary for the operation of the I/O device.

Data and status registers are connected to the data lines, and have unique addresses.

I/O interface circuit coordinates I/O transfers.


Connection: I/O vs Memory Buses
CPU’s communication with memory and IO

Separate Bus

Common address bus and data bus

Common address bus data bus and control bus


Separate Buses for Both Memory
and I/O
Common Address and Data Bus and
Separate Control Bus

I/O Mapped I/O or Isolated I/O: I/O devices and memory have separate
address space.
Common Address, Data Bus and
Control Bus

Memory Mapped I/0: From the available memory address space, some
addresses are assigned to I/O devices.
Memory Mapped I/O vs I/O Mapped I/O

Memory Mapped I/O I/O Mapped I/O


I/O devices do not have separate I/O and memory both have their own
address space. separate address space.
Some memory space remains Memory is fully utilized.
untilized.
Same instructions can be used for both I/O access and memory access
memory and I/O devices. e.g., MOV instructions are distinct. e.g., IN, OUT
for I/O.
Modes of I/O Transfer

Program Controlled I/O

Interrupt Driven I/O

DMA
Program Controlled I/O
Processor repeatedly monitors a status flag to achieve the necessary
synchronization.

Processor polls the I/O device.

DATAIN Register

SIN
STATUS Register
Program Controlled I/O
Example: Reading a line from a keyboard, characterwise and store it in
memory location starting from LINE. Need to stop the reading process once
‘enter’ key is pressed and call a function named PROCESS1 to process the
i/p.

MOVE #LINE, R0 // starting address is loaded into R0


WAITK : TESTBIT #0, STATUS // CHECKS THE SIN FLAG AT POSITION 0

BRANCH=0 WAITK // If SIN==0, THERE IS NO KEYSTROKE

MOVE DATAIN, R1 // Read char from the Datain reg of the interface to R1
MOVE R1, (R0)+ // Read char is moved to mem loc pointed by R0
COMPARE #$0D, R1 // Char typed is checked for ascii code of ‘enter’ key

BRANCH≠0 WAITK // If the pressed key is not ‘\n’, then continue reading

CALL PROCESS1 //If the pressed key is ‘\n’, then call the function Process1
Interrupt Driven I/O

Bindu Agarwalla

14
Interrupt Driven IO
Polling method – Processor waits for response from I/O device. During
wait period processor not able perform useful computation.

Come out from these problem using Interrupt

The I/O devices can alert the processor when it becomes ready. It can do
so by sending a hardware signal called an interrupt request to the
processor.

Interrupt Example 1

Consider a task that requires continuous extensive computations to be


performed and the results to be printed on a printer.
Interrupt Driven IO
COMPUTE produces a set of n lines of
output.

PRINT routine sends one line of text to


the printer for printing.
After printing one line printer sends
Interrupt request signal to the
processor. (At i th line.)

So processor interrupt the execution of


COMPUTE routine and transfers the
control to PRINT routine.
Interrupt Driven IO
The routine executed in response to an interrupt request is called the
interrupt-service routine.
E.g. PRINT routine.
Processor responds or interrupt request responds to interrupted device by sending
the control signal called Interrupt Acknowledge.

When interrupt occurs during execution of program processor register used, flag
status information must saved in stack before execution of the interrupted program
is resumed.

In this way, the original program can continue execution without being affected in
any way by the interruption, except for the time delay.

The task of saving and restoring information can be done automatically by the
processor or by program instructions.
Sequence of events in response to an
Interrupt Request

Bindu Agarwalla

18
Interrupt Driven IO
The process of saving and restoring registers involves memory transfers that increase
the total execution time, and hence represent execution overhead.

Saving registers also increases the delay between the time an interrupt request is
received and the start of execution of the interrupt-service routine.

This delay is called interrupt latency.

This kind of delay not acceptable in Real-time processing.


The sequence of events involved in handling an
interrupt request from a single device

1. The device raises an interrupt request(INTR)

2. The processor COMPLETES the execution of the current instruction and the
program currently being executed is interrupted and saves the contents of the PC and
status/flag register.

[3. Interrupts are disabled by clearing the IF bit in the status/flag register to 0.]

4. The action requested by the interrupt is performed by the interrupt-service routine,


during which time the device is informed that its request has been recognized, and
in response, it deactivates the interrupt-request signal.

5. Upon completion of the interrupt-service routine, the saved contents of the PC and
Status registers are restored (enabling interrupts by setting the IF bit to 1), and
execution of the interrupted program is resumed.
Interrupt Hardware

INTR= INTR1+INTR2+…..+INTRn

INTR is active low signal.


Enabling and Disabling of Interrupts
A single interrupt request from one device by activating the interrupt-request signal,
remains activated until it learns that the processor has accepted its request. This
means that the interrupt-request signal will be active during execution of the
interrupt-service routine.

It is essential to ensure that this active request signal does not lead to successive
interruptions, causing the system to enter an infinite loop from which it cannot
recover.

In some situations interrupt have to be ignored e.g. PRINT interrupt from the
printer cannot be serviced by the the processor if COMPUTE is not ready with
text to print.
Enabling and Disabling of Interrupts
First Possibility
Ignore the Interrupt Request until the completion of the current ISR

The processor hardware ignores the interrupt request line until the execution of
the first instruction of the ISR has been completed.

Ignore the Interrupt Request until the completion of the current ISR
i.e. first instruction of ISR is Interrupt Disable and last instruction is Interrupt
Enable

ISR
DI
.....
......
.....
EI
IRET
Enabling and Disabling of Interrupts
Second Possibility:
The processor automatically disables interrupts before starting the execution of
the ISR.

On entry to an ISR
1. Processor first saves the contents of the program counter (PC) and the
processor status (PS) register with IF=1 on stack
2. Automatically disable interrupts before starting the execution of the
interrupt-service routine.
On Exit from an ISR
When return from interrupt instruction is executed the contents of PC and PS is
popped with IF=1 from stack.
Enabling and Disabling of Interrupts
[3rd Method]
INTR line must accept only at the leading edge of the signal (Edge triggered line)

Processor receives only one request regardless of how long the line is activated.

So there is no question of multiple interruption or no need of explicit instruction


for enable/disable interrupt.
Handling Multiple Interrupt
Requests

Bindu Agarwalla

26
Handling Multiple Requests

When a number of devices can send interrupt requests on a common line INTR
connected to the processor, How can the processor determine which device is
requesting an interrupt?
Solution: POLLING
When an interrupt request is received it is necessary to identify the particular device
that has raised the request.

The information needed to determine whether a device is requesting an interrupt is


available in its status register.
When a device raises an interrupt request, it sets one of the bits in it’s status register to
1(one).

The simplest way to identify the interrupting device is to have the ISR poll all the I/O
devices connected to the bus.

The first device encountered with it’s IRQ bit set is the device that should be serviced.
Handling Multiple Requests
The Polling method is easy to implement.

The main drawback is the time spent in interrogating the IRQ bits of all the devices
that may not be requesting any service.

So the next solution is Vectored Interrupt.


Handling Multiple Requests
Given that different devices are likely to require different interrupt-service routines,
how can the processor obtain the starting address of the appropriate routine in each
case?

Solution: Vectored Interrupt


To reduce the time involved in the polling process, a device requesting an interrupt
may identify itself directly to the processor. Then, the processor can immediately start
executing the corresponding ISR.

A device requesting an interrupt can identify itself if it has its own interrupt-request
signal, or if it can send a special code (like memory address of ISR) to the
processor

A commonly used scheme is to allocate permanently an area in the memory to hold the
addresses of interrupt-service routines. These addresses are usually referred to as
interrupt vectors, and they are said to constitute the interrupt-vector table.
Handling Multiple Requests
When an interrupt request arrives, the information provided by the requesting
device is used as a pointer into the interrupt-vector table, and the address in the
corresponding interrupt vector is automatically loaded into the program counter.

Some Points
IO devices send interrupt vector code over the data bus.

When a device sends an interrupt request, the processor may not be ready to receive the
interrupt-vector code.

May be interrupt is disabled by the processor at that moment.


or, the data bus may be used by the processor to complete the execution of the
current instruction.

So, when the processor is ready to receive the interrupt vector code, it sends the
INTA signal to the device.

Only after receiving the INTA signal, the device places the interrupt vector
code on the data bus.
Nesting of Interrupt Requests
and
Simultaneous Interrupt Requests

Bindu Agarwalla

31
Interrupt Nesting
If request comes from more than one device:
Sometimes some device need immediate response from processor e.g. System
Clock, Real time system.
Interrupt requests from higher-priority devices will be accepted.

This can be resolved by using Priority Based Interrupt.


It uses Multiple-level priority

Interrupt requests will be accepted from some devices but not from others, depending
upon the device’s priority. To implement this scheme, we can assign a priority level to
the processor that can be changed under program control.

The processor’s priority can be encoded in a few bits of the processor status register.
Interrupt Nesting
Multiple priority scheme implemented using individual INTR and INTA lines.

Priority arbitration circuit: A logic circuit which combines all interrupts but
allows only the highest-priority request.
Interrupt Nesting
Two types of Priority Scheme: Fixed Priority: Lowe number indicates higher priority.
Rotating Priority: Once one device is serviced, that device will become the lowest
priority device and the device next in sequence will become the highest priority
device.
Handling Multiple Devices
[Simultaneous Requests]
Daisy Chain

INTR line is common to all the devices.

INTA is connected in daisy chain fashion where INTA signal propagates serially
through the devices.
Handling Multiple Devices
[Simultaneous Requests]
When IR is active on INTR line then INTA is sent to Device 1 and passes
to device 2 if it does not require any service.

In daisy chain, the device that is electrically closest to the processor will
have the highest priority.
Handling Multiple Devices
Combining the Multiple-priority and daisy chain.
Where device organized in group and each have different priority level.

37
Thank You.
Good Luck
Direct Memory Access
To transfer large blocks of data at high speed, an alternative approach is used.

A special control unit provided to allow transfer of a block of data directly


between an external device and the main memory, without continuous
intervention by the processor. This approach is called DMA.

DMA transfers are performed by a control circuit that is part of the I/O device
interface called DMA controller.

For each word transferred, processor provides the memory address and all
the bus signals that control data transfer.

Since DMA controller has to transfer blocks of data, the DMA controller must
increment the memory address for successive words and keep track of the number
of transfers.
Registers used in DMA Operation
DMA controller has number of registers that are accessed by the processor to initiate
transfer operations.
Registers used in DMA Operation
One register is used for storing the Starting address to/from which the
communication will take place.

The word count register is used for storing the COUNT of units to be transferred.

The third register contains status and control flags.

The R/W bit determines the direction of the transfer.


When this bit is set to 1 by a program instruction, the controller performs a
read operation, that is, it transfers data from the memory to the I/O device,
else write operation.

When the controller has completed transferring a block of data and is ready to
receive another command, it sets the Done flag to 1.

Bit 30 is the Interrupt-enable flag, IE. When this flag is set to 1, it causes the
controller to raise an interrupt after it has completed transferring a block of data.

The controller sets the IRQ bit to 1 when it has requested an interrupt.
DMA Operation
DMA Transfer
When transfer of data between peripheral and memory is needed, the peripheral
places it’s request to the DMA controller attached to it.

The DMA controller sends Bus Request(HOLD) signal to the CPU for releasing
the control of the system bus.

The CPU responds to it by terminating the current instruction execution.

The CPU initializes the DMA controller by sending the following information
through the data bus.

1. The starting address of the memory block where the data are available (for read)
or where the data need to be stored (for write operation).

2. The word count which is the number of bytes in the memory block.

3. Control to specify mode of transfer such as read/ write.


Then the CPU sends the Bus Grant (HLDA) signal to the DMAC, and releases the
control on the system bus.
DMA Transfer
After receiving the HLDA, the DMA Controller informs the peripheral and starts
the operation.

It continues to transfer data between memory and peripheral unit until the entire
block is transferred.

For each transfer, memory address is incremented and the word count is
decremented.

When the count becomes zero, no more transfer take place.

After the transfer is over, the DMA Controller sends an interrupt request to the
processor.

In response to this interrupts, processor takes back the control on the system bus.
Modes of DMA Transfer: Cycle
Stealing and Burst Mode

In Cycle Stealing, the DMA Controller takes the control of buses from CPU for
transferring one word of data in one cycle and returns the control to the CPU.

Alternatively, the DMA controller may be given exclusive access to the main
memory to transfer a block of data without interruption. This is known as block or
burst mode.
47
Problem [GATE 2016]
The size of the data count register of a DMA controller is 16 bits.
The processor needs to transfer a file of 29, 154 kilobytes from disk to main
memory.
The memory is byte addressable. The minimum number of times the DMA
controller needs to get the control of the system bus from the processor to
transfer the file from the disk to main memory is...............

Solution:
Using 16bit count register , the maximum value of count will be 26 x 1024 Bytes

Number of Transfers= 29,154 x 1024 / 65,535 =455.5=456


48
Problem [GATE 2016]
On a non-pipelined sequential processor, a program segment which is part
of the interrupt service routine is given to transfer 500 bytes from the IO
device to memory.
Assume that each statement in this program is equivalent to machine
instruction which takes one clock cycle to execute if it is non-load/store
instruction. The load/store instructions take two clock cycles to execute.

Initialize the address register


Initialize the count to 500
LOOP: Load a byte from the device
Store in memory at the address given by address register
Increment the address register
Decrement the count
If count != 0 go to LOOP

The designer of the system also has the alternative approach of using DMA
controller to implement the same transfer. the DMA controller requires 20 clock
cycle for initialization and other overheads. Each DMA transfer Cycle takes two
Clock cycle to transfer One byte of data from the device to memory
49
Problem [GATE 2016]
What is the approximate speed up when the DMA controller based design is
used in place of interrupt-driven program based input-output?

Solution:
With Interrupt technique,
The time taken is: 1+1 +(2+2+1+1+1) x 500 CC= 3502CC

With DMA technique,


The time taken is: 20 + 2 x 500 CC= 1020CC

Speedup= 3502/1020 =3.43


RISC Vs CISC
RISC CISC
An instruction set architecture that is A full set of computer instructions that
designed to perform a smaller number of intends to provide the necessary
computer instructions so that it can capabilities in an efficient way.
operate at a higher speed.
More Registers Fewer registers
Instructions have simple, fixed formats Instructions have variable formats with
with few addressing modes. several complex addressing modes.

Has simple instructions-the program Has complex instructions-the program


length is long. length is short.
Used in Hardwired CU; used in Used in Microprogrammed CU; used in
applications such as mobile phones and applications such as desktop computer
tablets. and laptops.
Thank You.
Good Luck.

You might also like