Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

CS6303 – COMPUTER ARCHITECTURE

VI SEMESTER

STUDY MATERIAL

CS6303
COMPUTER ARCHITECTURE

ANNA UNIVERSITY

REGULATION 2013

PREPARED BY V.BALAMURUGAN, ASST.PROF/IT

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

UNIT 1 – OVERVIEW AND INSTRUCTIONS 10. _What is an instruction and instruction set? U
PART – A  Instruction is a part of code, written in step-by-step
1. List the eight ideas invented by computer architects procedure for the CPU to complete a certain task N
 Design for Moore’s Law  Instruction set is a set of instructions to be executed by the
 Use Abstraction to simplify Design compiler, that contains all the details of the tasks I
 Make the common case fast  Ex for Instruction set:-
 Performance via Parallelism Arithmetic instructions Ex: (ADD, SUB) T
 Performance via Pipelining Logic instructions Ex: (AND, OR, NOT)
 Performance via Prediction Data transfer instructions Ex: (MOVE, LOAD, STORE)
 Hierarchy of memories Control flow instructions Ex: (GOTO, CALL, RETURN)
11. What is instruction format?
 Dependability via redundancy 1
2. What is pipelining?  The format in which the instructions are written is called as
Pipelining is a set of data processing elements connected in Instruction format
series, where output of one element is the input of next element.  Each instruction has three fields.
3. What are the major hardware components? OPCODE  It specifies which operation is to be performed.
 Input Unit (Keyboard, Mouse, etc.) MODE  It specifies how to find effective address.
ADDRESSIt specifies the address in memory/register
 CPU (Memory Unit, ALU, Control Unit)
OPCODE MODE ADDRESS
 Output Unit (monitor, printer, speaker, etc.)
4. What is CPU and ALU?
12. What are the different logical instructions?
CPU: Central Processing Unit
INSTRUCTION EXAMPLE Equivalent to
It is also called as brain of the computer
Input and Output devices work according to the CPU AND AND $1, $2, $3 $1 = $2 & $3
ALU:- Arithmetic Logic Unit OR OR $1, $2, $3 $1 = $2 | $3
 It performs arithmetic and Logical operations NOR NOR $1, $2, $3 $1 = ~ ($2 | $3)
 It is present inside the CPU ANDI AND $1, $2, imme $1 = $2 & imme
ORI OR $1, $2, imme $1 = $2 | imme
 It uses main memory for operations (RAM)
5. What is control unit? SHIFT LEFT SLL $1, $2, 10 $1 = $2 << 10
 It is present inside the CPU SHIFT RIGHT SRL $1, $2, 10 $1 = $2 >> 10
SHIRT RIGHT SRA $1, $2, 10 $1 = $2 >> 10
 It controls the operation of input unit, output unit and ALU
ARITHMETIC
 It has the overall control of the computer
 It tells memory unit to send/receive data
13. Write the different control operations.
 It tells ALU what operation to perform Conditional branch:-
6. What is Response Time and Throughput? BEQ instruction  Branch on EQual ( BEQ $s, $t, offset)
 the time between the starting and ending of a task is BNE instruction  Branch on Not Equal (BNE $s, $t, offset)
called as response time. It is also called as “execution Unconditional branch:-
time”. J instruction  Jump (J target)
 The total amount of work done in the given time is called JAL instruction  Jump And Link (JAL target)
as Throughput JR instruction  Jump Register ( JR $s)
7. What is CPU time?
 Amount of time, the CPU spends for doing a task is 14. What is PC-relative addressing?
called as CPU time  It is also called as Program Counter Addressing
 It is also called as CPU execution time  The address of the Data or Instruction is specified as an
 Time of waiting for I/O is not included. offset, relative to the incremented Program counter.
CPU time spent in program (user CPU time)  It is used in conditional branches
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 =
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑠𝑝𝑒𝑛𝑡 𝑖𝑛 𝑂𝑆 (𝑠𝑦𝑠𝑡𝑒𝑚 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒)  Offset value can be direct or indirect value
8. Write down the formula for power consumed by CPU.  Operand address = PC + Offset
The formula for finding the power consumed by CPU is,  Ex: BEQZ $t0, strEnd
P = C V2 f BEQZ = Branch if EQual to Zero
Where,P=power, C=Capacitive loading, V=Voltage, f= frequency 15. State Moore’s law
9._What are multiprocessor systems? Give its advantages.
 “Number of transistors per square inch, on Integrated Circuits
 Computer systems that contain more than one processor are (IC) had doubled every year since the IC was invented”
called as Multiprocessor systems
 “Computer architects should concentrate on, after the design
 They execute more than one applications in parallel. is finished, where the technology will be. Don’t bother where
 They are also called as shared memory multiprocessor s/m it has started”.
 High performance, high cost, high complexity. 16. State Amdahl’s law
Advantages:-  Improve the performance of common case, optimize rare case.
 Improved cost-performance ratio  Use common case for designing non-frequent cases.
 High speed processing  This method makes simple design and faster.
 If one processor gets failed, other processors will  This idea is called as Amdahl’s law.
continue work.
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

PART – B Performance via parallelism:-


1. Explain the eight ideas invented by the architect for  Computer architects improves the performance by performing
designing the computer system. the operations in parallel.
 Design for Moore’s Law  A processor handles several activities simultaneously in the
 Use Abstraction to simplify Design execution of an instruction.
 Make the common case fast  Advantage: Faster performance of CPU.
 Performance via Parallelism
Performance via pipelining:-
 Performance via Pipelining
 It is the extended concept of parallelism.
 Performance via Prediction
 Pipeline is a set of data processing elements connected in
 Hierarchy of memories series, where the output of one element is the input for the
 Dependability via redundancy next element.
Design for Moore’s law:-  Elements that are independent from pipeline are executed in
 It is developed by Gordon Moore parallel to improve their performance.
 He is the Co-founder of Intel
 “Number of transistors per square inch, on Performance via prediction:-
Integrated Circuits (IC) had doubled every year since  Nowadays, processors reduce the bad effects of branches.
the IC was invented”  Predict the output of a condition
 “Computer architects should concentrate on, after  Test and start executing the indicated instruction
the design is finished, where the technology will be.  It is better than waiting for a correct answer.
Don’t bother where it has started”.  If the prediction is accurate, performance is improved
 Moore’s law graph:-
o “Up and to the Right” Hierarchy of memories:-
o Represent designing for quick change  Programmers need the memory to be fast, large and cheap.
 Cache is a small memory for having recently used data.
 Memory hierarchy used:- Top – to - bottom
Speed Fast to slow Top
Cost Expensive
Size Smallest to largest Bottom

Registers
In-board Cache Small, fast,
storage Main Memory costly

Use abstraction to simplify the design:- Out-board Magnetic disk


storage (CD, DVD,
 Abstraction means, hiding some part of the component.
BluRay)
 It is used by the programmers and architects
Off – line Magnetic tapes
 It is used to represent the design at many levels. Large, Slow,
storage
 At each level, its low level details are hidden. Cheap
 This concept improves productivity Dependability via redundancy:-
 It simplifies the design  This idea is expensive
 It reduces time for designing  It uses RAID concept
 Ex:-  RAID – Redundancy Array Inexpensive Disk
o In OS, I/O management details are hidden  In RAID, data is stored redundantly on multiple disks
o In high level languages, sequence of
 If one disk fails, other systems will continue working
instructions are hidden.
2. State the CPU performance equation and discuss the
Make the common case fast:-
factors that affect performance.
 Many big improvements in the performance of a
CPU performance equation:-
computer come from the improvements in common case
 CPU performance equation is defined as the ratio of
 Improve the performance of common case, optimize rare
case.
product of (Instruction Count and no. of steps to execute
one instruction) to the clock rate
 Use common case for designing non-frequent cases.
T=NxS
 This method makes simple design and faster.
R
 This idea is called as Amdahl’s law. Where,
 Ex:-  T = CPU Execution time / Program Execution Time
o It is easy to design a sports car from ordinary  N = No. of instructions
car.  S = No. of steps to execute one instruction
o But it is not possible to design a sports car  R = Clock rate
from Van.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Speed:- Performance equation – 2


 To measure the performance of the computer CPU execution time = Instruction Count x CPI x Clock cycle time
 It is used to measure how quickly a computer executes CPU execution time = ____IC x CPI___
programs. Clock rate
Response time:- Power equation:-
 Time between the starting and ending of a task  If performance is increased, clock speed is also increased.
 It is measured in seconds per program.  If clock speed is increased, heat is also increased.
 It includes disk access, memory access and I/O  If heat is increased, power consumption is also increased.
activities.  Formula for power consumed by CPU:-
 It is also called as execution time, wall-clock time, P = C V2 f
elapsed time Where=power, C=Capacitive loading, V=Voltage, f= frequency
Throughput:-
 Total amount of work done in a given time. 3. What is instruction format? What are the types of
 If response time is decreased, throughput is decreased. instructions available?
Increase the speed:-  The format in which the instructions are written is called as
 To increase the speed of a computer:- Instruction format
o Decrease the response time  Some specific rules has to be followed while writing the
o Increase the throughput instructions.
 To decrease response time and increase throughput:-  Each instruction has three fields.
o Use the faster version of processors OPCODE  It specifies which operation is to be performed.
o Add extra number of processors MODE  It specifies how to find effective address.
Relation between performance and execution time:- ADDRESSIt specifies the address in memory/register
Performance = ______1______ OPCODE MODE ADDRESS
Execution time Types of instructions:-
Let X and Y be two different computers,  Three address instructions
Performance x > Performance y  Two address instructions
____1______ > ______1_______  One address instructions
Execution time x Execution time Y  Zero address instructions
Three address instructions:-
Execution time Y > Execution time X  Three Addresses of three registers are mentioned
CPU time:-
 Bits are required to specify three addresses of three operands
 Amount of time, CPU spends for doing a task is called as CPU
time  Bits are required to specify the operation
 It is also called as CPU execution time  Syntax: Operation Destination, source1, source2.
 Time of waiting for I/O is not included.  Ex: ADD A, B, C
CPU time spent in program (user CPU time)  This Instruction adds B+C and stores in A ( A B + C)
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 =
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑠𝑝𝑒𝑛𝑡 𝑖𝑛 𝑂𝑆 (𝑠𝑦𝑠𝑡𝑒𝑚 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒)  Where, ADD  operation, A  destination, B,C Source
Ex:-  More execution time taken because of three addresses.
Given:- Two address instructions:-
User CPU time = 90.7 sec  Two Addresses of two registers are mentioned
System CPU time = 12.9 sec  Bits are required to specify two addresses of two operands
Elapsed time = 2 min 39 sec ( 159 sec)  Bits are required to specify the operation
Find:-  Syntax: Operation Destination, source.
CPU time = ___90.7 + 12.9__ = 0.65  Ex: ADD A, B
159  This Instruction adds A+B and stores in A ( A A + B)
Performance equation – 1  Where, ADD  operation, A  destination, B Source
 Less execution time than three address instructions.
CPU execution time = CPU clock cycles X clock cycle time
One address instructions:-
 Clock rate is inverse of clock cycle time  One Address of one register is mentioned
 Clock rate =______1______  Bits are required to specify one address of one operand.
Clock cycle time  Bits are required to specify the operation
 CPU execution time = ____CPU clock cycles___  Syntax: Operation Destination (or) Operation Source
Clock rate
 Lesser execution time than two address instructions
 Performance can be improved by reducing length of the
 Ex: ADD A  Ex: LOAD A
clock cycle or number of clock cycles
 This instruction adds the  This instruction loads the
 Execution time depends upon number of instructions in
contents of Register A and contents of Register A into
a program. Accumulator(A A + AC) Accumulator(AC A)
CPU clock cycles = Instructions X average clock cycles per instruction
 CPI = average no. of clock cycles taken by each
 Where, ADD  operation,  Where, ADD  operation,
instruction for execution. A  destination A  source
Where, CPI = Clock cycles Per Instruction  operand acts as Destination  operand acts as source

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Zero address instruction:- Shift Left Logical Instruction:-


 It contains no address fields  It contains two register operands
 Source and destination operands are mentioned  Syntax: Operation, destination, source1, constant
implicitly  Ex: SLL $1, $2, 10
 Absolute address of the operand in a special register is  It shifts the value of $2 register Left side by 10 places
automatically incremented of decremented  Extra Zeroes are shifted in.
 Top of the Pushdown stack is pointed.  It stores result in destination $1
 Bits are required to represent operation only.  It is equivalent to $1 = $2 << 10
 Syntax : Operation Shift Right Logical Instruction:-
 Ex: ADD  It contains two register operands
 Very less execution time than one address instructions.  Syntax: Operation, destination, source1, constant
4. What is logical instruction? Explain some logical  Ex: SRL $1, $2, 10
instructions with examples each.  It shifts the value of $2 register right side by 10 places
 Instructions that perform logical operations which manipulate  Extra Zeroes are shifted in.
Boolean values are called as Logical instructions.  It stores result in destination $1
INSTRUCTION EXAMPLE Equivalent to  It is equivalent to $1 = $2 >> 10
AND AND $1, $2, $3 $1 = $2 & $3 Shift Right Arithmetic Instruction:-
OR OR $1, $2, $3 $1 = $2 | $3  It contains two register operands
NOR NOR $1, $2, $3 $1 = ~ ($2 | $3)  Syntax: Operation, destination, source1, constant
ANDI AND $1, $2, imme $1 = $2 & imme  Ex: SRA $1, $2, 10
ORI OR $1, $2, imme $1 = $2 | imme  It shifts the value of $2 register right side by 10 places
 Sign bit is shifted in.
SHIFT LEFT SLL $1, $2, 10 $1 = $2 << 10
 It stores result in destination $1
SHIFT RIGHT SRL $1, $2, 10 $1 = $2 >> 10
 It is equivalent to $1 = $2 >> 10
shirt right arithmetic SRA $1, $2, 10 $1 = $2 >> 10
5. What is addressing mode? Explain the types of addressing
AND instruction:-
modes.
 It contains three register operands
 Register addressing mode
 Syntax: Operation, destination, source1, source2
 Ex: AND $1, $2, $3
 Absolute addressing mode (or) Direct mode
 It performs Bitwise-AND operation between source1 and  Immediate addressing mode
source2 and stores the result in Destination.  Indirect addressing mode
 It is equivalent to $1 = $2 & $3  Index addressing mode
OR instruction:-  Relative addressing mode
 It contains three register operands  Auto increment mode
 Syntax: Operation, destination, source1, source2  Auto decrement mode
 Ex: OR $1, $2, $3 Register addressing mode:-
 It performs Bitwise-OR operation between source1 and  It is the simplest addressing mode
source2 and stores the result in Destination.  Both the operands are registers.
 It is equivalent to $1 = $2 | $3  It is much faster than other addressing modes
NOR instruction:-  Because they do not contact any other memory
 It contains three register operands  The name of the register is mentioned in the instruction.
 Syntax: Operation, destination, source1, source2  Ex: ADD R1, R2 (R1R1+R2)
 Ex: NOR $1, $2, $3  Where, ADD  operation, R1destination, R2  source
 It performs Bitwise-NOR operation between source1
and source2 and stores the result in Destination. Registers
Opcode Address
 It is equivalent to $1 = ~($2 | $3)
ANDI instruction:-
 It contains three register operands
 Syntax: Operation, destination, source1, source2
 Ex: ANDI $1, $2, imme
 It performs Bitwise-AND operation between source Operand
registers and immediate values.
 It stores the result in Destination register.
 It is equivalent to $1 = $2 & imme
ORI instruction:- Absolute addressing mode:-
 It contains three register operands  It is also called as direct addressing mode
 Syntax: Operation, destination, source1, source2  Because the address of location of operand is given directly
 Ex: ORI $1, $2, imme in the instruction.
 It performs Bitwise-OR operation between source  Ex: MOVE A, 2000
registers and immediate values.  This instruction copies the contents of memory location 2000
 It stores the result in Destination register.
into the Register A.
 It is equivalent to $1 = $2 | imme
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

Immediate addressing mode:- Auto Increment mode:-


 The operand is given directly as a numerical value  Here the effective address of the operand is the contents of a
 It doesn’t require any extra memory access to fetch operand register specified in the instruction.
 It executes faster (immediate).  After accessing the operand, the contents of this register are
 Ex: MOVE A, #20 incremented to address the next location.
 JUMP instruction  Ex: MOVE (R2), + R0
 The # symbol says that it is an immediate operand.  The contents of R0 is copied into the memory location whose
 The value 20 is moved to the Register A address is in the register R2
 Ex: ADDI $t1, $0, 1  After copying, the contents of register R2 is automatically
 Where ADDI  ADD Instruction Immediate $t1 operand incremented by 1.
$0  Register1, 1 Immediate value. Auto decrement mode:-
Indirect mode:-  The contents of the register is decremented by one and then
 It is also called as Register Indirect addressing mode. it is used as the effective address of the operand.
 Here, the address is not given directly  Decrement operation for the register is done first, and then
 The memory address should be determined from the the instruction is continued.
instruction  Ex: MOVE R1, -(R0)
 These addresses are called as Effective Address (EA)  The contents of the address contained in R0 is decremented
 The effective address of the operand is the contents of a by one, and then it is moved to R1 register.
register (or) the main memory location, whose address is 6. Assume two address format specified as source,
given directly in the instruction. destination. Examine the following sequence of instructions
 When the effective address of the operand is the contents and explain the addressing modes used and the operation
of a register, it is called as Register addressing Mode. done in every instruction:-
 Ex: MOVE A, (R0) i) MOVE (R5)+, R0
ii) ADD (R5)+, R0
 It copies the contents of memory addressed by the contents
iii) MOVE R0, (R5)
of register R0 into the register A
iv) MOVE 16(R5), R3
 Register given within the parenthesis ( ) is called as Pointer
v) ADD #40, R5
10 2000 10 Solution:-
i) MOVE (R5)+, R0
A R0 2000  Addressing mode: Auto increment addressing mode.
Index addressing mode:-  This instruction can be split as:
 Indexing is a technique that allows programmer to refer MOVE (R5), R0
the data (operand) stored in memory locations one by INCREMENT R5
one.  This is also called as automatic post-increment mode.
 In index addressing mode, the Effective address of the  Because, the increment is done after the operation
operand is generated by adding a constant value to the  Operation: R5 is source, R0 is destination (given)
contents of a register  R5 contains some memory address. Go to that memory
 That constant is specified in the instruction address and fetch the data from there.
 Ex: MOVE 20(R1), R2  MOVE it to Register R0
 It loads the contents of register R2 into memory location  Then increment R5
whose address is contents of R1 + 20 ii) ADD (R5)+, R0
 Where, MOVE  operation, (R1)  contents of R1,  Addressing mode: Auto increment addressing mode.
20(R1) contents of R1 + offset 20, R2 source  This instruction can be split as:
operand ADD (R5), R0
INCREMENT R5
10 1000 10  This is also called as automatic post-increment mode.
R2 R1 1020  Because, the increment is done after the operation
 Operation: R5 is source, R0 is destination (given)
 R5 contains some memory address. Go to that memory
Relative addressing mode:- address and fetch the data from there.
 It is also called as PC-Relative addressing mode  ADD it to Register R0, and store in R0.
 Because Program Counter is used in this mode.  Then increment R5
 Here, the effective address is calculated by the index iii) MOVE R0, (R5)
mode using the program counter.  Addressing mode: Register indirect addressing mode
 It is generally used in branch instructions  Because only registers are used in this instruction
 Operand address = PC + offset  And, one register is given indirectly within parenthesis.
 Ex: BEQZ $t0, END  R0  Source, R5 destination
 Where, $t0  Source operand,  The contents of R0 is moved to the memory location
whose address is contained in R5.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

iv) MOVE 16(R5), R3 Keyboard:-


 Addressing mode: indexed addressing mode  It is a standard input device attached to all type of
 Operation:- computers
 The 16th index position of R5 is moved to R3.  It contains keys arranged in the form of QWERTY
 Effective address (EA) = (R5) + 16  It contains many keys such as TAB, CAPSLOCK,
 R5 SPACE BAR, ALT, CTRL, ENTER, HOME, END, etc
1(R5) R3  It contains 101 to 104 keys
2(R5)  If we press the keys in the keyboard, electrical signals
. are sent to the computer.
. Moved to Mouse:-
.  It is used with personal computer
16(R5)  Old type of mice have magnetic ball at the back.
.  Nowadays, Infrared mice are used, that works on
v) ADD #40, R5 infrared light at the back of the mouse.
 Addressing mode: Immediate addressing mode Scanner:-
 Operation:-  Keyboard can give input only the characters
 The # sign indicates that this is immediate operand  Scanners can give a picture as input to the computer
 Here source #40, Destination R5 (given)  Scanner is an optical device that takes a picture and gives as
 It Adds the value of operand 40 to the value of Register R5 input to the computer
 And stores the result in R5 Ex: MICR, OMR, OCR
CPU:-
7. Explain the components of a computer system in detail.  It is called as brain of the computer
Components of a computer
 It performs takes such as arithmetic and logical operations
 CPU is divided into three parts: ALU, Control Unit, Registers
ALU:-
Hardware Software  After the system gets input data, it is stored in primary
storage.
I/P Memory O/p  The actual processing of data takes place at Arithmetic Logic
Unit System Appln Unit (ALU)
Software software  It performs addition, subtraction, multiplication, division,
logical comparison, etc.
CPU
= Program
 It also performs AND, OR, NOT, XOR, etc. operations
ALU, Develop
Program
RunTime
Java,
Games,
Control Unit:-
CU
ment Envt MS iffice,  It acts like the supervisor of a computer
Envt etc  It controls the overall activities of a computer components
 It checks all the operations of a computer are going correctly
HARDWARE COMPONENTS:- or not.
The organization of a computer has four major parts. They are:-
 It determines how the instructions are executed one by one
o Input Unit
 It controls all the input and output operations of a computer
o CPU
 For executing an instruction, it performs the following steps:-
o Output Unit o Address of the instruction is placed in address bus
o Memory unit o Instruction is read from the memory
o Instruction is sent for decoding
o Data from that address is read from the memory
o These data and address are sent to the memory
o Again the next instruction is taken from the memory
Registers:-
 They are high speed memory units for storing temporary data
 they are small in size
 It stores data, instruction, address, etc.
 ALU is also a register
 Types: Accumulator, GPR, SPR( PC, MAR, MBR, IR)
 Accumulator: to store the operands before execution. It
receives the result of ALU operation
Input Unit:-
 GPR: General Purpose Registers are used to store data and
 Input devices get the data from the user and converts to
intermediate results
the machine understandable form.  SPR: Special Purpose Registers used for certain purpose.
 Ex: Keyboard, Mouse, Scanner, Joystick, Light pen,  PC: Program counter, MAR: Memory Address Bus, MBR:
Card reader, Webcam, Microphone Memory Buffer Register

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Memory Unit:- Merits – CD Demerits - CD


Primary storage:- Large capacity compared to ROM Read only, cannot be updated
 It is a part of CPU Cheaper, light weight Access is slow compared to
magnetic disk
 Its storage capacity is limited Reliable, removable and efficient It needs careful handling, easily get
 It contains magnetic core or semiconductor cells scratched
 It is used for temporary storage  CD-RW:-
ROM:- o We can read and write data in that CD
o Major type of memory in a computer is called ROM o Maximum capacity of 700MB
o Read Only Memory o Light weight, reliable, removable, efficient
o It can be read, cannot be written o Lot of spaces wasted on outer tracks
o It is used for storing permanent values  DVD:
o ROM does not gets erased, even after the power is  Digital Video Disk
switched off  It is the improved version of CD
o They are non-volatile ( information cannot be erased)  Available in 4.7GB, 8.54GB, 9.4GB, 17.08GB
o We can store the important data into the ROM  Both the sides are used for storage
o Types: PROM, EPROM, EEPROM  They cannot be scratched or damaged like CD
RAM:-
 We can store full movies, OS in one single DVD
o They are used for storing programs and data that are
 USB drives:-
executed
 They are commonly called as PEN DRIVES
o It is different from ROM
o Random Access Memory  They are removable storage
o It can be read and written  They are connected to the USB port of a computer
o It is volatile  They are fast and portable
o When the power is turned off, its contents are erased  They store larger data when compared to CD, DVD
o It is also called as RWM (Read Write Memory) (1GB to 64GB pen drives)
o It is faster than ROM Hard disk:-
o Static RAM (SRAM) and Dynamic RAM (DRAM) are  Hard disks are disks that store more data and works faster
its types  It can store 10GB to 2TB
o Its cost is high  It consists of platters; 2 heads are there for read and write
o Its processing speed is also high  It is attached to single arm
Cache memory:-  Information in hard disk is stored in tracks
o It is a very small memory used to store intermediate Floppy Disk
results and data  They can store 1.4MB of data
o It stores the data that are more frequently called.  They are 5.25 to 3.5 inch in diameter
o It is present inside the CPU, near the processor  They are cheap, portable
o It is used for the faster execution Output Unit:-
Secondary Storage:-  It is a medium between computer and the human
o The speed of primary memory is fast, but secondary  After the CPU finishes operation, the output is displayed in
memory is slow the output unit
o But the memory capacity of primary memory is low. So,  Types of output:- Hardcopy, softcopy
secondary memory is used  Hardcopy: The output that can be seen physically using a
o It contains large memory space printer is called as hardcopy
o It is also called as additional memory or auxiliary memory  Softcopy: The electric version of output that is stored in a
o Data is stored in it permanently computer or a memory card, or a hard disk
o Ex: Magnetic tape, Hard disk, Floppy, Optical Disc, etc.  Ex: Monitor, Printer, Plotter
Magnetic tape:- Monitor:-
 It is used for large computers like mainframe computers  It is the most popular output device
 Large volume of data is stored for a long time  It is also called as Visual Display Unit (VDU)
 It is like a old tape-recorder cassette  It is connected to a computer through a cable called Video cord
 They are cheap  LCD: Liquid Crystal Display monitors
 They store data permanently o Flat screen- Liquid crystals are used for display
 It is compact, low cost, portable, unlimited storage.  CRT: Cathode Ray Tube monitors
Optical Disk:- o They are old-fashioned TV set like monitor
 CD-ROM: Printer:-
 Compact Disk  The output of a computer can be printed using Printer,
 They are made of reflective material to get the hardcopy
 High power laser beam is passed to store data onto CD  Laser printer, Inkjet printer: Impact printers. They give
 Cost is low, storage capacity is 700MB fast printouts with good quality using LASER
 It can only be READ, can’t be written  Dot matrix printer: non-impact printers. Their quality is
 Only a single side can be used for storage poor, they are used for billing purpose

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Plotter:-
 They are used for printing graphics
 They are used in CAD/CAM
 Pen plotters take printout by moving a pen across the paper
SOFTWARE COMPONENTS
1.System Software
 They are in-built within the computer system
 They are essential for a computer to operate.
 A computer cannot be run without them
 They control and manage the hardware components
Software for Program Development Environment:-
 Text Editor: To type the program and make changes.
 Compiler: Converts High level language to machine code
 Assembler: Converts Assembly level language to m/c code
 Linker: Combines OBJ programs and creates EXE code
 Debugger: To clear errors in EXE program
Software for RunTime Environment:-
 OS: It operates the overall computer system
 Loader: Loads the EXE file into memory for execution
 Libraries: Precompiled LIB files that are used by other pgrms
 Dynamic Linker: Load and Link shared libraries at run time.
2. Application Software:-
 They are softwares necessary for problem solving.
 Programs such as JAVA, Games, MS-Word, Dictionary,
Emulator, etc are the examples.
 They are not necessary for a computer to operate.
 They are optional; If user wants, he/she can install them.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

UNIT 3 – PROCESSOR & CONTROL 9. What are the stages of MIPS pipeline?
 IF : Instruction fetch from memory
UNIT
 ID : Instruction Decode
Part – A  EX : Execute the operation
 MEM : Access the memory for an operand
1. How the performance of CPU is measured?  WB : Write back the results in a register
 Instruction Count: It is determined by Instruction Set
Architecture (ISA) and compiler. 10. Define hazard. Mention its types.
 Cycles Per Instruction (CPI) and Clock cycle time: It is  Any condition that makes pipeline to stall is called HAZARD
determined by the CPU hardware  It avoids execution of next instruction in the instruction stream
 Structural hazard: Two instructions use same resource at same time
2. Write the basic performance equation of CPU  Data hazard: Data are not available at expected time in pipeline
 Control hazard: branch decisions made before branch condition
 CPU time = Instruction count X CPI X Clock cycle (or)
 CPU time = (Instruction count X CPI) / Clock Rate
11. What is data hazard? Mention its types.
3. What is MIPS?  Data hazard occurs when data are not available at expected
time in a pipeline
 Million Instructions Per Second
 Consider two instructions: I1 occurs before I2
 It is a metric used to measure CPU performance
 RAW: Read After Write: I2 reads before I1 writes it
 It is defined as the ratio of Instruction count to the product
 WAW : Write After Write : I2 writes before I1 writes it
of execution time and 106
 WAR : Write After read: I2 writes before I1 reads it
 MIPS = Instruction count / (Execution time X 106) U
12. What are the methods to handle Data hazard?
4. What are the types of Instruction in MIPS instruction set?
 Forwarding: Result is passed forward from a previous
 Memory reference instructions : Load word (LW), store word (SW)
 Arithmetic logical instructions : ADD, SUB, AND, OR, SLT
instruction to a later instruction
 Bypassing: Passing the result by register file to the desired
N
 Control flow instructions : BEQ, JUMP unit
5. What are the steps involved in MIPS instruction execution? 13. What are the methods to handle control hazard?
 Fetch instruction from memory
I
 Stall the pipeline
 Decode the instruction  Predict branch not taken
 Execute the operation
 Access an operand in Data memory
 Predict branch taken
 Delayed branch
T
 Write result into register

6. What is a data path?


 Data path is the pathway that data takes through the CPU
14. Define an exception with Ex.
 It is also called as interrupt -
 It is defined as an unscheduled event that disturbs the normal
 Data travels through data path; control unit regulates it execution of the program
 It consists of functional units that perform ALU operations Ex:
 ADD R1, R2, R1  R1 = R2 + R1
3
7. What is PC?  Arithmetic overflow has occurred
 Program counter is defined as a register which is used to
store the address of the instruction in the program being
executed
 It is a 32 bit register, written automatically after end of clock
cycle
 No need of a WRITE control signal

8. What is pipelining? Mention its purpose and advantages


 Pipelining is defined as the implementation technique in
which more than one instruction is overlapped for
simultaneous execution
 It is used to make the processors fast
 It is divided into stages; each stage finishes a part of
execution in parallel
 All stages are connected one to next one to form a pipe
 It increases instruction throughput.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Part - B Adder:-
1. Explain the types of MIPS Instruction format.  Increment the PC to solve next instruction
 An ALU is connected to perform addition of its two 32bit
R-Format:- inputs, place result on its output
Opcode Rs Rt Rd SHAMT FUNCT
31-26 25-21 20-16 15-11 10-6 5-0 Registers:-
 Also called as Register format  It is a structure that contains processor’s 32 GPR
 Because only registers are used  They can be read / Written
 Three register operands : Rs, Rt, Rd  It contains 4 inputs (2 read ports + 1 write port + 1 writeData)
 Rs, Rt  source register  It contains 2 outputs (two read data)
 Rd  Destination
 SHAMT  Shift And Move To ALU:-
 FUNCT  ALU functions (ADD, SUB, AND, OR, SLT)  Input: two 32bits
 Opcode for R-format = 0  Output: 32bit result
ALU control lines Function
000 AND Data memory Unit:-
001 OR  Input: Address and write data
010 ADD  Output: Read result
110 SUB
111 (SLT)Set on Less Than Sign extension Unit:-
 Input: 16bit sign
I-Format:-  Output: 32bit extended sign
Opcode Rs Rt Address
31-26 25-21 20-16 15-0 MUX:-
 For Load/Store instructions  It is also called data selector
o For LOAD, Opcode = 35  It allows multiple connections to the input of an element and
o For STORE, opcode = 43 have a control signal SELECT among the inputs.
o Rs  Base register
o Rt For load, it is the destination register, for Building a data path:-
store, it is the source register
o Memory address = base register + 16 bit Fetch instructions:-
address field
 For Branch Instructions
o For BRANCH, opcode = 4
o Rs, Rt  source registers
o Target address = PC + (sign-extended 16-bit
offset address << 2 )
J-Format:-
Opcode Address
31-26 25-0
 For Jump instructions, Opcode = 2
  To execute any instruction, first the instruction is fetched
Destination address = PC [31-28] | | (Offset address < < 2 )
from memory
2. Explain Datapath and its control in detail.  To prepare for executing next instruction, PC is incremented
(Or) by 4 bytes, which points to next instruction
Explain Datapath and its control implementation schemes Data path for R-Format instructions:-
for MIPS instruction formats with neat diagrams.

Data path:-
 Data path is the pathway that data takes through the CPU
 Data travels through data path; control unit regulates it
 It consists of functional units that perform ALU operations
Functional Units of data path:-
Instruction memory:-
 It is a memory unit to store instructions of a program and
 Regiter File and ALU are needed additionally with previous
supply instructions
components.
Program Counter:-
 ALU gets input from DataRead ports of register File
 Program counter is defined as a register which is used to
store the address of the instruction in the program being  Register file is written by ALUResult output of ALU with
executed RegWrite signal

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Data path for Load/Store instructions:- ALU control lines Function


t 000 AND
001 OR
010 ADD
110 SUB
111 (SLT)Set on Less Than

ALU
Opcode ALUop Operation FUNCT ALU control
action input
LW 00 Load Xxxxxx Add 010
Word
SW 00 Store Xxxxxx Add 010
word
BEQ 01 Branch on Xxxxxx Sub 110
Equal
R-Type 10 ADD 100000 Add 010
 Data memory unit and Sign extension unit are needed additionally
R-type 10 SUB 100010 Sub 110
 Three register inputs are read from instruction field
R-Type 10 AND 100100 AND 000
 Memory address is calculated based on instruction field
 For Load, data at memory address is read from data memory R-Type 10 OR 100101 OR 001
 For store, write data is written into data memory R-Type 10 Set On 101010 SLT 111
Less Than
Data path for Branch/Jump instructions
Truth Table for Three ALU control bits:-

ALUop FUNCT field


ALUop1 ALUop2 F5 F4 F3 F2 F1 F0 Operation
0 0 X X X X X X 0010
X 1 X X X X X X 0110
1 X X X 0 0 0 0 0010
1 X X X 0 0 1 0 0110
1 X X X 0 1 0 0 0000
1 X X X 0 1 0 1 0001
1 X X X 1 0 1 0 0111

Control Signals used in Control implementation:-


Signal When RESET (0) When SET (1)
RegDst Destination Register Destination Register
Number comes from Number comes from
Rt Rd
 Branch target = Incremented PC + sign-extended, lower 16bits
of instruction, shift left 2 bits RegWrite None Register on Write
register input is written
 Compare register contents using ALU with value of Write
Data input
Combining data paths for simple implementation:- ALUsrc Second ALU operand 2nd ALU operand is
comes from 2nd sign-extended, lower
register file o/p 16bits of instruction
PCsrc PC replaced by o/p of PC is replaced by o/p
adder that calculates of adder that
PC+4 calculates branch
target
MemRead None Data memory contents
designated by address
input are put on Read
data o/p
MemWrite None Data memory contents
designated by address
input are replaced by
value on Write data i/p
MemtoReg Value fed to register Value fed to register
write data i/p comes write data i/p comes
from ALU from data memory

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

ALUop Data path : R-Type:-


0  Fetch instruction and increment PC

1
 Get operands from register file, based on Src reg num
 Perform ALU operation using ALUSrc=0
ALUop

 Select o/p from ALU using MemtoReg=0


1

0
 WB to destination register(regWrite=1, RegDst=1)
Branc
h

1
Write
Mem

0
Read
Mem

0
Write
Reg

0
Memto
Reg

X
0

1
ALUsr
c

0
RegDs

X
1

0
t

Data path : Memory access (Load):-


 Fetch instruction and increment PC
Instru

format
ction

BEQ

 Get base register operand from reg file


SW
LW
R-

 Perform addition of register value with ALUsrc=1


 Use ALU result as address for data memory
 Use MemtoReg = 1 to select read data and WB to
destination register using RegWrite=1, RegDst=0

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Data path : memory access (store):- Data path : Jump:-


 Fetch instruction and increment PC  Shift instruction bits 25-0 left two bits to create 28bit value
 Get base register and data from register file  Combine with 31-28 bits of PC+4 to get 32bit jump addr
 Perform addition of register value with ALUsrc=1  Additional MUX uses Jump control to select instruction
 Use ALU result as address for data memory address
 Using MemWrite=1, write data operand to memory address  0: incremented PC (or) Branch target
 1: Jump address

Data path : branch:-


 Fetch instruction and increment PC
 Read 2 registers from register file for comparison
 ALU subtracts data values using ALUsrc=0 3. What is pipelining? Explain its stages with an example.
 Generate branch address: add PC+4, shift left by 2  Pipelining is defined as the implementation technique in
 Use zero o/p from ALU to find which result to be used which more than one instruction is overlapped for
for updating PC simultaneous execution
 If equal, use branch address Purpose:-
 Else, use incremented PC  It is used to make the processors fast
 It is divided into stages; each stage finishes a part of
execution in parallel
 All stages are connected one to next one to form a pipe
Advantage:-
 It increases instruction throughput.

Example:-
Consider the following instructions:-
LW R1,100(R0)
LW R2, 200(R0)
LW R3, 300(R0)

Without pipeline:-

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

With pipeline:- Handling data hazard:- (solutions)


Forwarding

Stages of pipeline:-
 IF : Instruction fetch from memory
 ID : Instruction Decode
 EX : Execute the operation
 MEM : Access the memory for an operand
 WB : Write back the results in a register

Graphical representation:- Bypassing

4. Explain the types of hazards with examples.


 Structural hazard
 Data hazard
 Control hazard
Structural hazard:-
 It occurs when two instructions use same resource at the
Control hazards:-
same time
 It is also called branch hazard
 Here, 1st instruction is accessing data from memory, 4th
 It occurs when branching decision made before branch
instruction is fetching an instruction from that same
memory at the same time.
Time CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
Handling control hazard:- (solutions)
Instr1 IF ID Ex MEM WB Stall the pipeline:-
IF ID Ex MEM WB
IF ID Ex MEM WB
IF ID Ex MEM WB
Data hazard:-
 It occurs when data are not available at the expected
time in the pipelined execution.
 Consider two instructions: I1 occurs before I2
RAW
 Read After Write
 I2 reads before I1 writes it
 So, I2 gets incorrect value Predict branch not taken:-
WAW
 Write After Write
 I2 writes before I1 writes it
 I1 modifies the value, so I2 gets incorrect value
WAR
 Write After read
 I2 writes before I1 reads it
 So I1 incorrectly reads the read value

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Predict branch taken:- Instruction fetch:-

Delayed branch:-

5. Explain the pipelined data path and its control.


Stages of pipeline:-
 IF : Instruction fetch from memory
 ID : Instruction Decode Instruction decode:-
 EX : Execute the operation
 MEM : Access the memory for an operand
 WB : Write back the results in a register
For Load instruction:-
1. Instruction Fetch (IF)
 Read instr’n from memory using address in PC
 Place the fetched instr’n in IF/ID pipelined register
 Increment the PC contents by 4 (PC PC + 4)

2. Instruction Decode (ID)


 IF/ID pipeline registers supply two registers to be read
 Read data from those two registers
 Store them in ID/EX pipeline register

3. Execute instruction (EX)


 Read reg1 contents from ID/EX pipeline register
 Add them using ALU
 Place the sum in EX/MEM pipeline register

4. Memory access (MEM)


 Read data memory using address from EX/MEM pipeline
 Load data into MEM/WB pipeline register

5. Write Back (WB)


 Read data from MEM/WB pipeline register
 Write it into register file

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Instruction Execution:-
Write Back:-

Memory access:- For Store instruction:-


1. Instruction Fetch (IF)
 Read instr’n from memory using address in PC
 Place the fetched instr’n in IF/ID pipelined register
 Increment the PC contents by 4 (PC PC + 4)

2. Instruction Decode (ID)


 IF/ID pipeline registers supply two registers to be read
 Read data from those two registers
 Store them in ID/EX pipeline register

3. Execute instruction (EX)


 Read reg2 contents from ID/EX pipeline register
 Add them using ALU
 Place the sum in EX/MEM pipeline register

4. Memory access (MEM)


 Read data memory
 store data into ID/EX pipeline register

5. Write Back (WB)


 No process is done
(Diagrams: Same as above for load instructions)

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

UNIT 4 – PARALLELISM 5. Define parallelism. What are its goals? Mention its types.
 Parallelism is defined as the process of doing multiple
PART – A operations at the same time.
 Goals:-
1.. Distinguish between strong scaling and weak scaling. o Speed up the processing, increase the speed.
o Increase the throughput
Strong scaling Weak scaling
o Improve the performance
Strong scaling means, at a Weak scaling means, the
 Types:-
constant problem size, the time to solve a problem with
o Instruction level parallelism
parallel speed up increases increasing size can be held
o Task parallelism
linearly with the number of constant by enlarging the
o Bit-level parallelism
processors used. number of processors used.
6. What is ILP? What are the approaches to exploit ILP?
It is limited by Amdahl’s law. It is limited by memory.  The technique which is used to overlap the execution of
instructions and improve performance is called as Instruction-
Level-Parallelism
2.. Distinguish between UMA and NUMA  Approaches:-
UMA NUMA o Dynamic hardware intensive approach
It is a type of shared memory It is a type of shared memory o Static compiler intensive approach
architecture. architecture.
All the processors are identical, All the processors are identical, 7. Define Loop level Parallelism
connected to a network, have connected to a network, have  The common way to increase the amount of parallelism
equal access to all memory individual memory units available among instructions is, to exploit parallelism
regions. attached to it. among iterations of a loop. This is called as Loop-level
They are also called as They are also called as parallelism.
Symmetric Multi-Processor Asymmetric Multi-Processor U
machines (SMP). machines (AMP) 8.. What are the types of dependencies?
M . .M. . . . . . . . . M
 Data dependence N
Bus interconnection n/w
 Name dependence
Bus interconnection n/w M
. . M. . . . . . . . M
 Control dependence I
P P P P P 9. What are the types of Data hazard? T
 RAW (Read After Write)
 WAW (Write after Write)
3.. What is Flynn’s classification?  WAR (Write after Read)
 Flynn has classified parallel computer architectures 4
based on number of concurrent instructions and data 10.. Define IPC
streams  Inter process communication is defined as a set of
 They are: SISD, SIMD, MISD, MIMD programming interfaces that allow a programmer to
coordinate activities among different program processes
Name Full form No.of No.of No.of data that can run concurrently in an operating system.
processor instruction stream  It allows the program to handle many user requests
SISD Single 1 1 1 concurrently.
Instruction
Single Data
SIMD Single N 1 N
11.. Mention the three ways to implement Hardware MT
Instruction  Coarse-grained Multi-threading
Multiple  Fine-grained multi-threading
Data
 Simultaneous Multi-threading (SMT)
MISD Multiple N N 1
Instruction
Single Data 12. Mention the advantages of multi-threading
MIMD Multiple N N N  To tolerate latency of memory operations, dependent
Instruction instructions, etc.
Multiple
Data  To improve system throughput by exploiting TLP
 To reduce context switch penalty
4.. Define Multi-threading.
 The ability of a CPU or a processor to execute multiple 13. What are multi-core processors? Give its applications.
processes or threads concurrently is called as Multi-  A multi-core processor is a single computing component
threading that contains two or more distinct cores in the same
 It allows multiple threads to share the functional units of package.
a single processor in overlapped fashion.  Applications: General purpose, embedded systems,
Networks, Digital Signal Processing, Graphics.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

PART – B MISD:-
1. Explain the Flynn’s classification of parallel computer  Multiple Instruction, Multiple Data
architectures with neat diagrams.  MULTIPLE INSTRUCTIONS are executed on SINGLE
 Flynn proposed a concept for describing a machine’s DATA streams by multiple processors
structure based on stream.  Ex: Pipelined Architecture
 Stream means a sequence of items
 There are two types of streams:-
o Data stream (Sequence of Data)
o Instruction stream (Sequence of instructions) Control Unit CPU1

 Flynn classified parallel computing architectures based


on number of concurrent instructions and data streams
Control Unit
 They are: SISD, SIMD, MISD, MIMD CPU2
Name Full form No.of No.of No.of data Memory
processor instruction stream .
SISD Single 1 1 1 .
Instruction .
Single Data .
CPU-N
SIMD Single N 1 N Control Unit
Instruction
Multiple
Data
MISD Multiple N N 1
Instruction
Single Data Advantages Disadvantages
MIMD Multiple N N N Better throughput than SISD High complexity
Instruction Less penalty than SISD High bandwidth required
Multiple
Data
Better performance than SISD Low level of parallelism exploited
SISD:-
 Single Instruction Single Data MIMD:-
 Each processor executes SINGLE instruction on a  Multiple Instruction, Multiple Data
SINGLE data stream  MULTIPLE INSTRUCTIONS are executed on
 Ex: IBM 704, VAX, CRAY – I MULTIPLE DATA streams on multiple processors
Instruction stream (CPUs)
Instruction stream
Data stream
Control Unit Instruction CPU Data Memory Memory-1
stream Stream B Control Unit CPU
U 1
S
.
I .
N
T .
I/O E
R
Instruction stream .
C
O Data stream
N
N
Advantages Disadvantages E
C
Control Unit CPU
N
Memory-n

Simple and easy to Low performance achieved T


I
implement. O
N

Less penalty will be levied. Low throughput is yielded


Less overhead will occur. Low level of parallelism exploited
Advantages Disadvantages
Better throughput than MISD
High complexity
SIMD:-
Less penalty than MIMD Difficult to deploy and repair
 Single Instruction, Multiple Data
High level of parallelism Difficult to learn
 SINGLE instruction is executed on MULTIPLE data exploited
streams by multiple processors is called as SIMD
 Ex: ILLIAC – IV, MPP, CM-2, STARAN 2. What is multi-threading? Explain hardware multi-threading
CPU1 Memory1 and its classification with illustrations
Definition:-
 The ability of a CPU or a processor to execute multiple
CPU2 Memory2 processes or threads concurrently is called as Multi-
Control . threading
Unit
.  It allows multiple threads to share the functional units of
.
. a single processor in overlapped fashion.
Use:-
CPU-N Memory-n  To increase the usage of existing hardware resources.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Purpose:- Advantages:-
 To tolerate latency of memory operations, dependent  More threads execute concurrently
instructions, etc.  Best processor utilization is done.
 To improve system throughput by exploiting TLP  High performance is achieved
 To reduce context switch penalty Disadvantages:-
Three ways to implement HMT:-  Highly complex task for software developers to develop
 Coarse-grained Multi-threading the software to implement SMT on the given hardware
 Fine-grained multi-threading  Security problem is also there. Intel’s hyper-threading
 Simultaneous Multi-threading (SMT) technology has a drawback. On a system with many
i) Coarse-grained multi-threading:- concurrent processes, from one process, one can steal
 When a thread is stalled due to some event, switch to a the login details which is running in another process.
different hardware context. This is called as coarse- Illustration:-
grained multi-threading
 It is also called as switch-on-event multi-threading
 Advantages:-
o It eliminates the need to have very fast
thread-switching
o It does not slow down the thread because
the instructions from other threads issued only
when the thread faces a costly stall.
 Disadvantages:-
o Since the CPU issues instructions from one
thread, when a stall occurs, the pipeline
must be emptied or frozen
o New thread must fill pipeline before Superscalar Course-grained Fine grained SMT
instructions can complete
ii)Fine-grained multi-threading:- Thread 1
Thread 2
 Switch to another thread in every cycle, such that no two
instructions from the thread are in pipeline concurrently Thread 3
 It improves the usage of pipeline by taking advantage of idle
multiple threads
 Advantages:- 3. What is ILP? Explain the methods to enhance the
o No need to check dependency between performance of ILP.
instructions because only one instruction is in Definition:-
pipeline from a single thread.  The technique which is used to overlap the execution of
o No need for branch prediction logic instructions and improve performance is called as Instruction-
o Bubble cycles are used for executing useful Level-Parallelism
instructions from different threads Principle:-
o Improved system throughput, latency,  There are many instructions in code that don’t depend on
tolerance, usage. each other so it’s possible to execute those instructions in
 Disadvantages:- parallel.
o Extra hardware complexity is created,  Build compilers to analyse the code
because many hardware contexts are there,  Build hardware to be even smarter than that code
many thread selection logic is there Approaches:-
o A single thread performance is reduced  Dynamic and hardware intensive approach:-
o Resource Conflicts are created between the o It depends on hardware to exploit the
threads parallelism dynamically at run time.
iii)Simultaneous Multi-threading(SMT):- o It is used in desktop, server and in wide range
 Intel introduced SMT in 2002 (Intel Pentium IV – of processors.
3.06GHz) o Ex: Pentium III and IV, Athlon, MIPS
 It uses resources of a dynamically scheduled processors R10000/12000, Sun UltraSPARC III, PowerPC
to exploit ILP 603, Alpha 21264
 At the same time, it exploits ILP, It converts TLP into ILP  Static and compiler intensive approach:-
 It also exploits following features from latest processors: o It depends on software technology to find
o Multiple functional units: latest processors parallelism statically at compile time.
have more functional units for a single thread o It is used in embedded systems
o Register renaming and dynamic o Ex: Intel IA-64 architecture, Intel Itanium
scheduling: multiple instructions from
independent threads can co-exist and co-
execute

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Methods to enhance performance of ILP:- Types of multicore processors:-


i) LLP  Two cores:-
ii) Vector instructions  Dual-core CPUs
Loop-level parallelism:-  Ex: AMD Phenom II X2, Intel Core Duo
 The common way to increase the amount of parallelism  Four cores:-
available among instructions is, to exploit parallelism among  Quad core CPUs
iterations of a loop. This is called as Loop-level parallelism.  Ex: AMD Phenom II X4, Intel i5 and i7
 Ex:-  Six cores:-
for ( i = 1; i < = 1000 ; i = i + 1 )  Hexa-core CPUs
{  Ex: AMD Phenom II X6, Intel i7 extreme
x[i] = x[i] + y[i];  Eight cores:-
}  Octa-core CPUs
 Every iteration of the loop overlap with any other iteration.  Ex: Intel Xeon, AMD FX-8350
 Within each loop iteration, there is less chance for overlap.  Ten cores:-
 LLP means, parallelism existing within a loop.  Deca-core CPUs
 This parallelism can cross loop iterations  Ex: Intel, Xeon E7-2850
Techniques to convert LLP to ILP:- Applications:-
 Loop unrolling: converting the loop level parallelism into  General purpose
instruction level parallelism.  Embedded systems
 Either compiler or the hardware is able to exploit the  Networks
parallelism inherent in the loop  Digital Signal Processing
 Ex:-  Graphics.
for ( i = 1; i < = 1000 ; i = i + 4 ) Fundamental Theorem:-
{  These type of processors take advantage of relationship
x[i] = x[i] + y[i]; between power and frequency.
x[ i + 1 ] = x[ i + 1 ] + y[ i + 1 ];  Each core is able to run at lower frequency
x[ i + 2 ] = x[ i + 2 ] + y[ i + 2 ] ;  When the power is given to a single core, it is divided
x[ i + 3 ] = x[ i + 3 ] + y[ i + 3 ]; among each cores.
}  Therefore the performance is increased.
 This technique works by unrolling the loop statically by the  This technique is used for designing dual core, quad
compiler or dynamically by the hardware. core, hexa core, octa core CPUs.
Vector Instructions:-  The power consumed is less.
 A vector instruction operates on a sequence of data items  To achieve this, expensive research techniques and
 This sequence executes in four instructions:- equipment are needed, so big MNCs like Intel can do it.
o Two instructions for load the vectors X and Y  Continuous advances in silicon process
from the memory technology from 65nm to 45nm to increase
o One instruction to add the vectors transistor density. Intel delivers superior energy
o One instruction to store the result vector efficient performance transistors.
 Processors that exploit ILP have replaced the vector-based  Enhancing the performance of each core with the
processors help of advanced micro architectures every two
 But still the vector based processors are used in graphics, years
digital signal processing, multimedia applications.  Improve the memory system and data access
among the cores. This decreases the latency and
4. What are multicore processors? Explain their mechanisms increases the speed and efficiency.
and applications in detail  Optimizing the interconnect fabric that connects
 A multi-core processor is a single computing component the cores to improve performance.
that contains two or more distinct cores in the same  Optimizing and expanding instruction set to
package. enhance the capabilities. If this is done, the
 Multiple cores can run multiple instructions at the same industries can use this Intel processors for
time, and it increases the overall speed. producing advanced applications with high
 It implements multiprocessing in a single physical component. performance, low power.
Previous technologies:- two threads Heterogeneous Multi-core processors:-
thread
Core
Core Core Core
Core
Chip
Chip
Core Core Core Core
Hyper-threading processors:
Early processors:
1 chip, 1 core, 2 thread
1 chip, 1 core, 1 thread

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Advantages:- Types of name dependence:-


 Massive parallelism is achieved Anti-dependence
 Special type of hardware available for different tasks  Anti-dependence between instruction ‘I’ and instruction
Disadvantages:- ‘j’ occurs when instruction ‘j’ writes a register or memory
 Developer productivity: training is needed to use location that instruction ‘I’ reads.
software.  The original order must be preserved to make sure that
 Portability: software written for one GPUs will not run on ‘I’ reads the correct value
other CPUs or other CPUs Output dependence:-
 Manageability: multiple GPUs and CPUs in a grid wants  Output dependence occurs when instruction ‘I’ and
balanced work load. instruction ‘j’ write the same register or memory location.
 The original order should be preserved to make sure that
5. Explain the types of dependences with examples. the final value is written to instruction ‘j’
Types:- Register renaming:-
 Data Dependence  Name dependence is not a true dependence , therefore they
 Name dependence can be executed simultaneously or, be reordered if the name
o Anti-dependence in two instructions doesn’t conflict
o Output dependence  The renaming can be done for register operands, where it is
 Control dependence called as register renaming.
Data dependence:-  They can be done statically by compiler or dynamically by h/w
 It is also called as true data dependences Control dependence:-
 An instruction ’j’ is data dependent on instruction ‘I’ if any  A control dependence finds the correct order of an instruction
one of these conditions is true:- ‘I’, with respect to a branch instruction
 Condition 1: instruction ‘I’ produces a result that may be  So that the instruction ‘I’ is executed in correct program order
used by instruction ‘j’.  For every instruction, control dependence is preserved.
instruction j  instruction i  Ex:-
 Condition 2: instruction ‘j’ is data dependent on If P1
instruction ‘k’, instruction ‘k’ is data dependent on {
instruction ‘I’ Statememt1;
Instruction j  instruction k  instruction i }
Ex:- If P2
Loop: LOAD D F0, 0(R1) F0array {
Statement2;
ADD D F4, F0, F2 Add scalar in F2 }
STORE D F4, 0(R1) Store result S1 is control dependent on P1
S2 is control dependent on P2 but not on P
DADDUI R1; R1,#8 Increment the pointer by Conditions:-
8bytes 1. An instruction that is control dependent on a branch
BNE R1, R2, Branch if R1!=R2
cannot be moved before the branch because its
LOOP
execution WILL NOT BE CONTROLLED by branch
Ex:- ELSE block cannot be executed before IF block
 Here, the dependency is there in between all the instructions 2. An instruction that is not control dependent on branch
 It is shown by the arrows cannot be moved after the branch, because its execution
 This order should not be changed is CONTROLLED by the branch
 If any order is changed, then it will create hazard in pipeline. Ex:- we cannot take a statement from IF block and send
it to ELSE block
Importance of Data hazard:- Preserving control dependence:-
 It tells hazard will occur or not  Instructions executed in program order: It makes
 It tells the order of the instructions for execution sure that the instruction that occurs before a branch is
 It sets a limit for how much parallelism could be exploited executed before the branch.
 Find the control hazard or branch hazard: It makes
Overcome data dependence:- sure that an instruction is control dependent on a branch
 Maintain the dependence but avoid hazard is not executed until the branch direction is known.
 Eliminate the dependence by changing the code  If the processors follow program order, then the control
dependence is automatically preserved.
Ignoring control dependence:-
Name dependence:-
 It is not must to preserve control dependence
 It occurs when two instructions use the same register or
 It can be violated if the instructions that should not be
memory location called “name”, but there is no data flow
executed is executed.
between instructions related to that “name”
 If we want program correctness, Exception behaviour and
 It is not true data dependence because values are not
data flow is needed
transmitted between instructions
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

Preserving exception behaviour:- RAW


 Preserving exception behaviour means, any changes in  J tried to read data before I writes it.
ordering of the instruction must not change how  So J gets old value, instead of new value
exceptions are raised in the program  This is the most common type of hazard
 Ex:-  It is true data dependence
DAADU R2, R3, R4  Program order should be preserved to make sure that j
BEQZ R2, L1 receives value from i
LW R1, 0(R2)  Ex:-
L1: I: ADD R1, R2, R3

Problem: Moving LW before BEQZ J: ADD R4, R1, R5


 If Data dependence with R2 is not maintained, the WAW
result of the program can be changed.  J tried to write data before it is written by i
 If we ignore the control dependence and move LW  The writes are performed in wrong order
before BEQZ, LW instruction will create memory  I comes first to write data, j comes second to write data
protection exception.  J only came last. So the value written by J is valid
 But here, value written by I is taken, which is wrong
Preserving data flow:-
 This is WAW hazard
 Data flow is the flow of data among instructions that produce
 It is occurs in pipelines that allow an instruction to
results
proceed even if previous instruction is stalled.
 Branches make data flow dynamic, since they allow the
 Ex:-
source of data for a given instruction to cone from many points
I: SUB R1, R4, R3
 Ex:-
DADDU R1, R2, R3 J: ADD R1, R2, R3
BEQZ R4, L WAR:-
DSUBU R1, R5, R6
L: . . .  J tries to write data before it is read by I, so it gets the
OR R7, R1, R8 new value instead of old value.
 Value of R1 used by OR instruction depends on whether  This is anti dependence
branch is taken or not  It does not occur in static pipelines
 OR instruction is data dependent on DADDU, DSUBU  It occurs when there are some instructions that write
 If branch is taken, value of R1 computed by DADDU is results early in instruction pipeline, and other instructions
used by OR that read data lately in the pipeline or, when the
 If branch not taken, value of R1 computed by DSUBU is instructions are reordered.
used by OR  Ex:-
Speculation:- I: ADD R4, R1, R5
 The violation of control dependence cannot affect J: SUB R5, R1, R2
exception behaviour or data flow is determined by the RAR:-
following:-  Read After Read is not a hazard
 Ex:-  Any number of read operations can be done.
DADDU R1, R2, R3  Because it is not going to change any data
BEQZ R12, skip next
DSUBU R4, R5, R6 7. Explain the challenges in parallel processing.
DADDU R5, R4, R9  Concurrency
Skip next: OR R7, R8, R9  Reduce latency
 Hide latency
6. Explain the types of data hazards with examples.  Increase throughput
Hazard:-  Data distribution
 A hazard is created whenever,  IPC
o There is a dependence between instructions  Cost of communication
o They are close enough and overlap is created.  Latency Vs Bandwidth
 It would change order of access to the operand involved  Visibility of communications
in the dependence.  Synchronous Vs asynchronous communication
Types:-  Scope of communication
 RAW (Read After Write)  Efficiency of communication
 WAW (Write After Write)  Load balancing
 WAR (Write After Read)  Equal partition
 Consider two instructions i and j with i occuring before j  Dynamic assignment
in program order.  Implementation and debugging

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Concurrency:-  If we use data parallel model, communications


 It is a property of a system representing the fact that are transparent, particularly on distributed
more than one activity can be executed at the same time memory architectures
 Algorithm should be divided into group of operations  Programmers cannot know exactly, how the
 Then only performance is improved by parallelism message passing IPC is working.
 All the problems doesn’t have same amount of  Synchronous and asynchronous communications:-
concurrency Synchronous Asynchronous
 Cleverness and experience of a programmer makes an They communications are They are also called as non-
algorithm to achieve maximal concurrency also called as blocking blocking communications
 Three ways for improving performance using communications
concurrency:- Because the work must wait Because the work can
 Reduce latency: Work is divided into small parts and and until the communications are continue even if
executed concurrently completed communications are not
 Hide latency: Long running tasks are executed together completed
concurrently  Scope of communications:-
 Increase throughput: If we execute multiple tasks  It is hard to find the tasks that must communicate with
concurrently, throughput of the system is increased each other during the stage of a parallel code
Data distribution:-  Efficiency of communications:-
 Distribution of a problem’s data is a challenge  Communication should be efficient, only the important
 Old type of parallel computers have data locality messages should be transmitted between the tasks
 It means, some data will be stored in memory that is Load balancing:-
closer to a particular processor and accessed quickly  Load balancing means, the practice of distributing
 Data locality occurs due to each processor having its approximately equal amounts of work among tasks, so
own local memory that all tasks are busy always.
 Because of data locality, a parallel programmer must  It is important to parallel programs for performance
concentrate on where the data is stored with respect to  It can be achieved by:-
the processors  Equal partition:-
 If more local values are there, the processor will access  If a task receives a work, divide it equally
them quickly and complete the work  For array/matrix operations, each task will do similar
 Distributing data and distributing data is tightly coupled work. So equally distribute data among the tasks
 If we want optimal design, we have to concentrate on  For loop iterations, work done in every iteration is
both of them similar, so distribute the iterations across tasks
IPC:-  Dynamic assignment:-
 Inter process communication  Certain type of problems create load imbalance even
after data is distributed equally among the tasks:-
 It is a set of programming interfaces that allow a
 Sparse arrays: some tasks will have actual
programmer to coordinate activities among different
data to work, others have zeroes
program processes that can run concurrently in OS
 Adaptive Grid: some tasks need to refine their
 It allows the program to handle many user requests at
mesh while others do not need to refine.
the same time
 N-body simulations: some particles may go
 These factors to be considered in IPC:- and come from the original task domain to
 Cost of comminucations: another task domain
 IPC virtually creates overhead.  If the amount of work each task cannot be predicted,
 Usually, machine cycles and resources are used for we can use scheduler-task approach. If a task finishes
computation its work, it is added to queue to get new work
 but here it is used to pack and transmit data  We need to design an algorithm that finds and handles
 Latency Vs Bandwidth: load imbalances because they occur dynamically
 Latency means, time taken to send a minimal inside the code.
message from point A to B (expressed in micro Implementation and debugging:-
seconds)  Programmers need to design parallel algorithms by
 Bandwidth means, amount of data that can be sent per creating single task that executes on each processor
unit of time (expressed in MBPS or GBPS)  Program is designed to perform different calculations
 Sending many small messages creates latency and and communications based on processor’s ID.
creates communication overhead  It is called as Single Program Multiple Data (SPMD), its
 If we want to increase bandwidth, more small advantage is, only one program must be written
messages are packed into large message
 Another way is, Multiple Program Multiple Data (MPMD)
 Visibility of communications:-
 In SPMD and MPMD, executable must be created to
 If we use message passing in IPC, direct
cooperatively perform computation while managing data
communication can be made
 It is visible and under control of programmer  If we want to implement such program, the knowledge of
sequential programming is needed.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

UNIT 5 – MEMORY & I/O SYSTEMS 8. Define TLB. What is its purpose?
PART – A  Translation Look-aside Buffer
1. Differentiate between volatile and non-volatile memory.  It is a cache memory.
Volatile memory Non-volatile memory  The page table in main memory is placed in the main
Memory that loses its Memory that does not lose its memory, but a copy of its small portion is placed within
contents when the computer contents when the computer Main memory Unit. This is called as TLB
is switched OFF is called as is switched OFF is called as  It contains page table entries of most recently accessed
Volatile memory non-volatile memory pages and their virtual addresses.
We need to refresh main We need not to refresh main  It can contain 32 page table entries
memory content periodically memory content periodically  TLB coupled with a 4KB page size, covers 128KB
Ex: RAM Ex: ROM memory addresses
9. What is DMA?
2. Differentiate between SRAM and DRAM.  Direct Memory Access
SRAM DRAM  Transferring a large block of data directly between an
Static Random Access Memory Dynamic Random Access external device and main memory is called as DMA
Memory  External device controls the data transfer
Information is stored in one bit Information is stored as  External device generates address and control signals
cell, called as Flip Flop charge across capacitor to control data transfer.
Information is erased if power Information is not erased if  This external device which controls the data transfer is
is switched OFF power is switched OFF called as DMA controller.
We need not to refresh We need to refresh memory 10. Define interrupts
memory periodically periodically  The event that creates interruption is called as interrupt.
Less packaging density High packaging density  Special routine that is executed to service the interrupt
More complex hardware Less complex hardware is called as Interrupt service
More expensive Less expensive  It is an external event that affects the normal flow of
execution.
3. Define locality of reference.  It is caused by external hardware such as keyboard,
 Instructions in localized area of the program are mouse, printer, etc
executed repeatedly during some period, and remaining 11. What is exception? What are the types of exception?
of the program is not accessed frequently.  An interrupt stops the currently executing program and
 This is called as locality of reference starts another program.
 Reference is within the locality = Locality of reference  This interrupt is created by external hardware.
 Ex: simple loops, nested loops  Like this, many events can create interrupts.
4. What are the types of Locality of reference?  All these type of events that stops current program and
 Temporal Locality of Reference (locality in time) creates new program is called as exception.
o Recently executed instructions are likely to be  Types: Faults, Traps, Aborts.
executed again  Faults: exceptions that are detected & serviced before
o Ex. Loops, Reuse execution of an instruction that creates problem
 Spatial Locality of Reference (locality in space)  Traps: exceptions that are reported immediately after
o Instructions stored near to the recently execution of instruction that creates some problem
executed instructions are also likely to be  Aborts: Exceptions that does not allow execution of an
executed again. instruction that creates problem.
o Ex: straight line code, array access. 12. What are the features (or) functions of IOP?
5. What are the techniques to improve cache performance?  IOP can fetch and execute its own instructions
 Reducing the miss rate: Reduce the chances of two  Instructions are specially designed for I/O processing
different memory blocks fight for same cache location  8089 IOP can perform data transfer, arithmetic and
 Reducing the miss penalty: Add additional level to the logical operations, branches, searching, translation.
hierarchy called as multi-level caching.  It also performs I/O transfer, device set up, programmed
6. What is the formula for calculating CPU execution time? I/O, DMA operation.
CPU = (CPU clockCycles+memoryStall clock cycles)  It can transfer data from 8-bit source to 16-bit destination
execution x clock cycle time
 It supports multiprocessing environment
time
13. Differentiate between programmed I/O and DMA
7. Define virtual memory.
programmed I/O DMA
 Virtual memory is defined as a technique that is used to
Software controlled data Hardware controlled data
extend the size of the physical memory.
transfer transfer
 In virtual memory concept, Operating system moves the Data transfer speed is low Data transfer speed is high
program and data between main memory and
CPU is involved in transfer CPU is not involved in it
secondary memory.
No controller is needed DMA controller is needed
 It is also called as imaginary memory. During transfer, data goes During transfer, data does not go
 -Main memory acts as cache for secondary memory. through processor through processor
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

PART – B TTL RAM cell:-


 TTL – Transistor-Transistor Logic
1. Explain the various memory technologies in detail with  Bipolar Memory cell is implemented using TTL multiple
neat diagrams if necessary. emitter technology
 There are five basic memory technologies that are in  It stores 1 bit of information (0 or 1)
current trend. They are:-  It is just like a Flip-Flop
 RAM (SRAM, DRAM)  Information will be there until power is ON
 ROM (PROM, EPROM, EEPROM)
 Flash memory (Flash cards, Flash Drives)
 Magnetic Disc memory
 Optical Disc memory(CD-R, CD-RW, DVD-R,DVD-RW)
RAM:-
 They are classified into SRAM and DRAM
 They can store the data until the power is ON
SRAM:-
 SRAM means, Static Random Access Memory
 They are built on MOS and Bipolar technology.
 MOS- MOS SRAM cell, Bipolar-TTL RAM cell
MOS SRAM cell:-

 X and Y select lines select a cell from matrix.


 Q1 and Q2 are cross coupled inverters (one is OFF, other
is ON always)
3
 If Q1 is ON, Q2 is OFF, 1 is stored in the cell.
 If Q1 is OFF, Q2 is ON, 0 is stored in the cell.
 State of the cell is changed to “0” by applying “HIGH” to Q1
2 emitter
 This makes Q1 off
 If Q1 is OFF, then Q2 will be ON (one should be ON always)
 As long as Q2 is ON, Q2 collector is LOW.
 1 can be rewritten by applying “HIGH” to Q2 emitter
DRAM:-

 Enhancement mode MOSFET transistors are used.


 T1 and T2 forms basic cross coupled inverters
 T3 and T4 acts as load resistors for T1 and T2
 X and Y lines are used for addressing the cell.
 When X and Y both are HIGH (1), Cell is selected
 When X = 1, T5 and T6 are ON, cell is connected to data
and data line
 If Y = 1, then T7 and T8 are ON.
 Because of this, either READ or WRITE is possible.
WRITE operation:-
 Enable W = 1
 If W = 1, and Din = 1, Node D is also 1.
 This makes T2 ON, T1 OFF.  Like a capacitor stores charge in it, DRAM stores data in it
 If next data of Din is 0, then T2 turns OFF, T1 turns ON  It contains 1000s of DRAM cells like the above diagram.
 When column (SENSE) and row (CONTROL) lines are
READ operation:-
HIGH, MOSFET conducts charge to the capacitor
 Enable R = 1
 When SENSE and CONTROL lines are LOW, MOSFET
 If R = 1, T10 becomes ON. opens and capacitor’s charge is locked.
 This connects data output line to data out  By this way, it stores 1 bit.
 This makes complement of the bit stored in the cell is  Since only a single MOSFET and capacitor needed, DRAM
available in output. contains more memory cells compared to SRAM
 Information is not erased if power is switched OFF
 We need to refresh the memory every millisecond.
 It is less complex hardware and less expensive
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

Write operation:- __ Refresh operation:-


 To enable WRITE operation, R/W line is made LOW
 This enables input buffer and disables output buffers
 To write “1” into the cell, Din = HIGH, transistor = ON, ROW line = HIGH
 This allows capacitor to charge a positive voltage
 When 0 is stored, LOW is applied to Din.
 Capacitor remains uncharged.
 If it stores “1”, capacitor is discharged.
 When ROW line is made LOW, transistor turns OFF,
disconnects capacitor from data line
 Therefore storing the charge (0 or 1) on the capacitor.
Refreshing “1”
in DRAM cell
 To enable refresh operation, R/W line, ROW line,
REFRESH lines are made HIGH
 This makes transistor ON, connects capacitor to
COLUMN line
 As R/W is high, output buffer is enabled
 The stored data bit is applied to input of refresh buffer
 Enables refresh buffer produces a voltage on COLUMN
line, related to the stored bit
Writing “1” into  Therefore the capacitor is refreshed.
DRAM cell SDRAM:-
 DRAM whose operation is directly synchronized with a
clock signal is called as SDRAM
 Synchronous Dynamic Random Access Memory
 In DRAM, processor sends addresses and control signals
to the memory.
 After some time delay, DRAM either reads or writes data
 During this delay, DRAM performs various internal
functions
 The processor has to wait in this delay.
 To avoid this problem, SDRAM is produced.
 SDRAM exchanges data with processor synchronized to
Writing “0” into an external clock signal.
DRAM cell  This makes processor to read and write data without delay
 SDRAM latches the address sent by the processor and
Read operation:- then responds after a number of clock cycles.
 Meanwhile the processor can do other task.

Reading “1”
from DRAM cell

 To read data from cell, R/W line is made HIGH


 This enables output buffer, disables input buffer
 Then, ROW line is made HIGH.
 It turns capacitor ON, connects the capacitor to Dout line
through output buffer
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

Timing Diagram of burst data transfer of length 4

DDR SDRAM:-
 Fastest version of SDRAM
 DDR  Double Data Rate
 SDRAM performs operations on rising edge of the clock
signal
 But DDR SDRAM performs operations on both the
edges of clock signals
 The bandwidth is doubled in DDR PROM:-
 It is also called as faster SDRAM  Programmable ROM
 Two banks of cell arrays are there in DDR SDRAM  It has diodes in every bit position
 It is dual bank architecture  Output is initially all 0s
 Each bank can be accessed separately  Each diode has fusible series link.
 Nowadays, DDR version II and III is released  By addressing the bits and applying proper current pulse
at output, we can blow out that fuse, store “1” at that bit
 Fuse is made up of nichrome
 For blowing, pass 20 – 50 mA current for 5 – 20 µs
 This blowing occurs according to truth table of PROM
 PROM programmers can do it programmatically
 That is why the name is called as PROM
 They are one-time programmable, once programmed,
information cannot be erased.
EPROM:-
 Erasable PROM
 They use MOS circuit
 They store 0s and 1s as a packet of charge in IC.
 They also can be programmed by EPROM programmers
 We can erase the data in it, by exposing the chip to UV
light through quartz window for 15 – 20 mins
 We cannot erase selective information, all information
will be vanished.
 It can be re-programmed and re-used many times
EEPROM:-
 It also uses MOS circuit
 Data is stored as: CHARGE or NO CHARGE
 20 – 25 V charge is used to move charges
 We can selectively erase information
 They are expensive than ROM
Flash memory:-
ROM:-
 They are RW memories (both READ and WRITE)
 They can store the data even after power is OFF
 We can read contents of a single cell, but can write whole
 We cannot write data to it block of cells
 Non-volatile memory  It is based on single transistor controlled by trapped charge
 It is used to store binary codes  They have higher capacity, less power consumption
 It contains only Diode and decoder  It is suitable for Laptop, tablets, smartphones, iPod, etc.
 Address lines A0 and A1 are decoded by 2 : 4 decoder  Types: Flash card (memory card)  1 GB to 64 GB
Flash drive(pen drive)  maximum 64GB capacity

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Magnetic Disk memory:-  Data is stored in Blocks of memory.


 It is a thin circular metal plate, coated with thin magnetic film  Cache controller decides which memory block should be
 Digital information is stored on it, by magnetizing the magnetic moved in / moved out of cache and main memory
surface  Locality of reference is responsible for best usage of
 Magnetizing head will be attached to the spindle, on which the cache
magnetic disc is spinning in its axis  Instructions in the localized area of the program are
 It is usually connected to a computer using SCSI bus executed repeatedly during some period and remainder
 Transfer speed in SCSI bus is much faster of the program is not accessed frequently. This is called
as locality of reference. Ex: Simple Loops, Nested
Optical Disk:- Loops
 CD-ROM:- Temporal Locality:- (Temporal Time)
 Compact Disk – ROM, max capacity 700MB  Recently executed instructions have more chances of
 Data is stored on a single side, other side is wrapped being executed again (very soon).
 Data recording is done by focussing laser beam on  It is also called as Locality in Time
surface of spinning disk  Example: Loops, Reuse
 disc is divided by tracks and sectors  Whenever the data is needed, it should be brought into
Merits – CD Demerits - CD cache.
Large capacity compared to ROM Read only, cannot be updated Spatial Locality:- (Spatial Space)
Cheaper, light weight Access is slow compared to  Recently executed instructions that are stored nearby,
magnetic disk have more chances of being executed again (very soon).
Reliable, removable and It needs careful handling,  Ex: straight line code, array access, etc
efficient easily get scratched  Whenever data is needed, that particular data alone will not
 CD-RW:- be placed into cache, whole memory block will be placed into
o We can read and write data in that CD cache.
o Maximum capacity of 700MB Types:-
o Light weight, reliable, removable, efficient Primary Cache:-
o Lot of spaces wasted on outer tracks  It is also called as processor cache (within the
 DVD:- processor)
o Digital Versatile Disk  It is also called as L1 (or) Level1 cache
o It is used for many purpose, that’s why DVD Secondary Cache:-
o We can store data on both the sides  It is also called as Level2 (or) L2 cache
o Available in 4.7GB, 8.54GB, 9.4GB, 17.08GB  It is placed between primary cache and main memory
o Large capacity than CD (RAM)
o We can store full movies, OS in one single DVD Merits – Cache Demerits – Cache
Faster than main memory Very small in size
2. Define Cache memory. Explain the types of Cache Quick access time Very expensive
memories and cache updating policies. Stores data quickly Difficult to design
Cache:-
 Every time the processor of a computer system has to Cache updating Policies:-
fetch program and data from the main memory for its  Cache stores some blocks at a time.
operations; But it is time consuming.  If cache size is small than all blocks in main memory,
 So a new kind of memory is introduced, to have a copy only the active segments of the program is placed in
of frequently used data; can be accessed very fast cache; execution time is reduced
because they are very small in size.  Processor requests for a word, if it is not present, cache
 This is called as cache memory. controller decides which block should be removed out of
 It is very smaller than RAM, placed between RAM and cache.
processor
Cache Main Memory Read Hit Write Hit
Processor (SRAM) (DRAM) Requested data by Cache memory has copies of data
processor is in main memory.
available in cache Write-Through protocol:-
Contents in cache and main
Cache memory are updated simultaneously
Controller
to avoid confusion
That data is obtained Write-Back Protocol:-
 Cache is made up or faster memory (SRAM) from cache and sent Updating the cache contents is
 Main memory (RAM) is made up of DRAM (slower) to processor called as Dirty/modified bit
 If the processor requests a data in cache, which is not Main memory contents are updated
available in cache, it is called as Cache Miss when a block has to be removed from
 If the requested data is available in cache, it is called as cache for inserting a new block
Cache Hit
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

Read Miss Write Miss


Block of words that After entire block is loaded in
contains the requested cache, requested word is sent to
word, is copied from main processor.
memory to cache --If requested word doesn’t exists
in cache during write operation,
WRITE MISS will occur.
During READ, If write-through protocol is
requested word is not in used, data is directly written to
cache, there READ MISS main memory.
will occur If write-Back protocol is used,
To overcome this, blocks containing addressed
Load-Through/ Early word are first put in cache, then
restart protocol is used the required word in cache is
overwritten with new data.
3. Explain the techniques used to reduce the cache miss.
(or)
Explain the methods of mapping functions and how are they
useful in improving cache performance.
(or)
Explain the mapping techniques in cache with neat diagram
 Usually cache memory can store only limited number of
blocks at a time. So, it can hold only a very small amount of
blocks from main memory  Higher order 5 bits are compared with tag bits associated
 This management of blocks between main memory and with that cache location
 If they match, then required word is present in that cache block
cache memory is called as mapping function.
 If they does not match, the required block is not present in
 There are two kinds of mapping techniques in cache
cache, so, read from main memory, load into cache
organization:-
Merits – Direct Mapping Demerits – Direct Mapping
Mapping Easy to implement and Processor needs to access
techniques understand same memory location from
two different pages on main
memory frequently
Associative Less time consumed in But only one location will be
Direct
Mapping implementing directly present in cache at a time
Mapping Cache is directly mapped with Not flexible
main memory

Fully Associative Mapping:-


Fully Set  Main block can be placed into any cache block position
Associative Associative  Address contains only two fields: Word, Tag
 Tag: To identify a memory block when it is in cache
Before going into techniques, some assumptions are made:-  Higher order 12 bits of an address received from CPU
 Cache consists of 128 blocks of 16 words each compared with tag bits of each cache block, to check
whether required block is present or not.
 Total cache size = 128 * 16  2048 (2K) words
 If required block is present in cache, Word field is used to
 1 page in main memory = group of 128 blocks of 16 words each
find required word from cache
 Main memory has 32 pages  We have freedom of choosing cache location for storing
 128*16=2048 * 32 = 65536 main memory block
 Main memory has 65536 words  If a new block enters cache, it has to remove the old blocks
Direct Mapping:- in cache only if the cache is full.
 Simplest mapping technique  Here, for replacement of cache blocks, replacement
 Each block from main memory has one location in cache. algorithms are used (LRU, LFU, FIFO, Random).
 Block ‘I’ of main memory mapped to block i%128 of cache  Compare higher order bits of main memory address is
 Main memory Blocks 0,128,256, …are stored in cache block 0 compared with all 128 tags corresponding to each block, to
 Main memory blocks 1, 129, 257, ..are stored in cache block 1 check whether requested block is present in cache.
 Here, address is divided into three fields: Tag,Block,Word Merits – Direct Mapping Demerits – Direct Mapping
 Word field: Select a word out of 16 words in cache Place a main memory block Compare Tag bits with all 128
 Block field: Contains 7 bits, because 128 blocks in cache(27=128) anywhere in cache tags of cache for checking
whether block is present in
 Tag Field: Select a page among 32pages in main memory
cache or not

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

 Three fields are needed


 Word field: select one of 16 words in a block
 Set field: find requested block from set 0 to set 64
 Tag field: 6 bits, because, 64 pages are there ( 26)
Merits – set-associative
Two directly mapped caches available;
Only two comparisons required to check given block is present
of not
Reduced hardware cost
improved cache hit ratio

4. Explain the organization of virtual memory and its address


translation technique with neat diagrams.
 In modern computers, main memory is not enough for all the
operations required by a processor of a computer
 So, virtual memory (VM) technique is used to extend the size
of main memory (RAM)
 It uses secondary storage such as disks, pendrives, etc.
 Virtual means imaginary. An imaginary memory is created
inside the operating system, so that the user will get a feel
that, main memory is this much large.
 For example, if a 32GB movie has to be displayed, main
Two-way set associative mapping:- memory is not enough to store it, so that 32GB movie is
 Set associative = direct mapping + associative mapping divided into segments.
 Many groups of direct mapped blocks operate as many direct  Now, the currently running segment of the movie is played in
mapped caches in parallel main memory, remaining are stored in secondary storage.
 A block of data from any page in main memory can go into  If next segment of movie is needed, it replaces the previous
particular block of directly mapped cache segment in main memory
 Required address comparison depends on number of direct  OS is responsible for management of VM
mapped caches in cache system  Here, the addresses issued by processor called as virtual
 These comparisons are always less than the comparisons in address / logical address
fully associative mapping  They are converted into physical address (real address).
 Size of 1 page in main memory = size of 1 directly mapped cache  Similarly, many applications can be run on a computer at the
 It is called as two way set associative because, each block same time, such as MS word, VLC, games, etc
from main memory has two choices for placing block.  There is not enough space in main memory to contain all
 Main memory Blocks 0, 64, 128, … can map into any 1 of these applications
cache blocks of set 0  But, in all these applications, only a small part will be currently
 Main memory blocks 1, 65, 129, … can map into any 1 of active; so it is enough to load that part alone into RAM
cache blocks of set 1, and so on  This concept is called as VM

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Address Translation:- Segment Translation:-


 address is broken into virtual page number and page offset  Every segment selector has a linear base address associated
 virtual page number is converted to physical page number with it and stored in segment descriptor.
 physical page number contains upper portion of physical  A Selector is used to point the descriptor for segment in a table
address, page offset contains lower portion of address of descriptions
 number of bits in page offset decides the page size  Linear base address from descriptor is then added to the 32-bit
 page table is used to maintain information about main offset to generate 32bit linear address
memory location of each page  This process is called as SEGMENTATION or SEGMENT
 The page is stored in which address of main memory, current TRANSLATION
status of page also stored in page table  If paging unit is not enabled, then the 32bit linear address
 To find address of corresponding entry in page table, Virtual corresponds to the physical address.
page number + contents of page table base register  If paging unit is enabled, paging mechanism translated linear
 Page table base register has starting address of page table address space into physical address space by paging process.
 The entry in page table gives physical page number Segment translation = Convert Logical address to Linear address
 Add this physical page number + offset  to get physical
address in main memory
 If required page is not present in main memory, PAGE
FAULT occurs; that page is loaded from secondary storage
to main memory by a program, PAGE FAULT ROUTINE
 The technique of getting desired page in main memory is
called as DEMAND PAGING
 To support Demand Paging and VM, processor has to access
page table in main memory
 To avoid this access time, a part of page table is kept in main
memory, called as TLB (Translation Lookaside Buffer)
 Buffer means, a temporary storage place.
 TLB stores part of page table entries (recently used pages)

Page Translation:-

 It is the second phase of address translation


 Segment translation translates logical address to linear address
Page translation converts that linear address to physical address
 When paging is enabled, the paging unit arranges the
physical address space into 1,048,496 pages of each
Virtual address to physical address translation 4096(4KB) long

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

5. Explain the purpose and working of TLB with a diagram. 6. Explain the Programmed I/O data transfer technique.

 It is also called as Page Translation Cache


 If processor refers two directories  Page directory and
page table, performance will be reduced. To solve this
problem, the processor stores the most recently used
page table entries in ON-CHIP cache. This is called as
TLB  I/O operation means, a data transfer between an I/O device
 It can hold up to 32 page table entries. and memory (or) between an I/O device and the processor
 32 page table entry coupled with 4K page size, results  In a computer system, if all the I/O operations are controlled
in coverage of 128K bytes of memory addresses by processor, then that system is using PROGRAMMED I/O
 Page table is placed in main memory, but a copy of small  If that technique is used, processor executes programs that
portion of page table is placed in processor chip. This start, run and end the I/O operations including sensing device
memory is called as TLB status, sending a R/W command or transferring data
 Based on virtual address, MMU (Memory Management  Processor periodically checks status of I/O system until the
Unit) searches TLB for required page operation is completed
 If page table entry for that page is found in TLB, Physical Example:-
address can be obtained immediately  Processor’s software checks each of I/O devices regularly
 IF entry not found, there is a Miss in TLB, The required  During the check, microprocessor sees whether any device
entry is got from page table in RAM, and then stored in needs any service or not.
TLB  The following diagram services I/O ports A, B and C
 If OS makes any changes to any entry in the page table,  The routine (program) checks the status of I/O ports
control bit in TLB will invalidate that entry in TLB  It first transfers status of I/O port A into accumulator
 When a program generates an access request to a  Then the routine checks contents of accumulator to check
page, that is not in main memory, page fault will occur. service request bit is SET or RESET
 That page should be brought from secondary storage  If SET, I/O port A service routine is called
(disk)  After completing, it moves on to port B
 When it detects a page fault, MMU asks the OS to generate  The process is repeated again
interrupt  It continues till all the I/O ports status registers are tested and
 OS will suspend the execution of the task which has all I/O ports are serviced.
created page fault, and starts execution of another task,  Once this is done, processor continues to execute normal programs
whose pages are ready in main memory.  When programmed I/O is used, processor fetches I/O related
 When the suspended task resumes, that instruction instructions from memory and gives the necessary I/O
must be continued from the point of interruption, Or, that commands to I/O system for execution
instruction must be restarted.  Technique used for I/O addressing is followed (memory
 If a new page is brought from Disk, and main memory is mapped I/O or I/O mapped I/O)
full now, then that new page should replace a page from  When processor sees an I/O instruction, the addressed I/O
the main memory according to LRU algorithm (Least port is expected to be ready to respond, to avoid info loss.
Recently Used)  Thus, a processor should know I/O device status always.
 Modified page is written to disk before removed from  In Programmed I/O systems, processor is usually
main memory. programmed to test the I/O device status before data transfer.
 Write through protocol is used for this task.
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

7. What is DMA? Explain DMA cycles and configuration with  Data transfer is monitored by DMA controller, which is also
neat diagrams. called as DMA channel
 It comes under hardware controlled data transfer  When CPU wants to read or write a block of data, it issues a
 An external device is used to control the data transfer command to the DMA module, with these instructions:-
 External device generated the address and control  R/W operation
signals to control the data transfer  Address of I/O device involved in this operation
 It allows the peripheral device to directly access the  Starting address in memory to read or write
memory  No. of words to be Read/Written
 This technique is called as DIRECT MEMORY DMA channel:-
ACCESS
 That external device that controls the data transfer is
called as DMA CONTROLLER

 It consists of Data counter, data register, address register,


control logic
 Data counter stores number of data transfers to be done in one
DMA Idle Cycle:- DMA cycle
 When the system is turned ON, the switches are in ‘A’ position  It is decremented automatically after each word transfer
 The buses are connected from processor to system  Data register acts as a buffer
memory and peripherals  Address register stores starting address of device
 Processor executes the program until it needs to read a  When data counter is ZERO, DMA transfer is stopped
block of data from disk  DMA controller sends an interrupt to processor saying that the
 To do this, processor sends series of commands to disk DMA operation is finished.
controller, telling it to search and read desired block of  Diagram (a) shows that the CPU, DMA module, I/O system and
data memory share the same system bus
 When disk controller is ready to transfer first byte of data  Here Programmed I/O is used
from disk, it sends DMA request (DRQ), which is a signal  Data transferred between memory and I/O system through
to DMA controller. DMA module
 Then DMA controller sends a hold request (HRQ), which  Each transfer of a word consumes two bus cycles
is a signal to the processor to HOLD input
 Diagram (b) shows that there is a different path between DMA
 The processor responds to this HOLD signal by sending module and IO system
acknowledgement (HLDA) to DMA controller.
 This is another DMA configuration
 When DMA controller receives HLDA signal, it sends
 Diagram I is third type of DMA configuration
control signal to change switch position from A to B
 Here I/O are connected to module using I/O bus
 This disconnects the processor from the buses and
 This reduces number of I/O interfaces in DMA module
connects DMA controller to the buses
DMA Active Cycle:-
 When DMA controller gets control of the buses, it sends
memory address where first byte of data from the disk is
to be written
 It also sends DMA Acknowledge, DACK signal to disk
controller device, telling it to get ready to send the byte
 Finally it asserts IOR and MEMW signals on control bus
 IOR (I/O Ready) signal enables disk controller to send
byte of data from disk on data bus
 MEMW (Memory Write) signal enables addresses
memory to accept data from data bus
 CPU is involved only at the beginning and at the end of
data transfer operations.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Demand Transfer Mode:-


 Here the device is programmed to continue data
transfer until TC (Terminal Count) or EOP (End of
Process) signal is encountered, or until DREQ (DMA
Request) is inactive
Series of operations
1. I/O device asserts DRQ line when it is ready to transfer data
2. DMAC asserts HLDA line to request use of the buses from
processor
3. Processor asserts HLDA, granting bus control to DMAC
4. DMAC asserts DACK to request I/O device, executes
8. Explain the different data transfer modes in DMA. DMA bus cycle and data transfer
DMA controller transfers data in any one of the following modes: 5. I/O device deasserts its DRQ after data transfer of 1 Byte
 Single Transfer Mode (Cycle Stealing) 6. DMA deasserts DACK line
 Block Transfer Mode 7. Byte transfer count is decremented, memory address is
 Demand (or) Burst Transfer Mode incremented
Single Transfer Mode:- 8. DMAC continues to execute data transfer until TC or EOP
 In this mode, device can make only one transfer (byte). is encountered
 After each transfer, DMAC gives control of all buses to 9. I/O device can restart DMA request by sending DRQ
processor (DMA Request) signal once again
Series of operations 10. Data transfer continues until transfer count = ZERO
 I/O device asserts DRQ line when it is ready to transfer data Single Transfer Mode:-
 DMAC asserts HLDA line to request use of the buses from
processor
 Processor asserts HLDA, granting bus control to DMAC
 DMAC asserts DACK to request I/O device, executes DMA
bus cycle and data transfer
 I/O device deasserts its DRQ after data transfer of 1 Byte
 DMA deasserts DACK line
 Byte transfer count is decremented, memory address is
incremented
 HOLD line deasserted to give back control of all buses to
processor
 HOLD signal reasserted to request use of buses when I/O
device ready to transfer another byte; same process repeated
until last transfer
 When data transfer count is ZERO, transfer is finished.
Block Transfer Mode:-
 Here, device can make number of transfers as programmed in
the word count register.
 After each transfer of word, count is decremented by 1, address
is incremented by 1 Block Transfer Mode:-
 DMA transfer is continued until word count becomes ZERO
 It is used when DMAC needs to transfer a block of data
Series of operations
 I/O device asserts DRQ line when it is ready to transfer data
 DMAC asserts HLDA line to request use of the buses from
processor
 Processor asserts HLDA, granting bus control to DMAC
 DMAC asserts DACK to request I/O device, executes DMA bus
cycle and data transfer
 I/O device deasserts its DRQ after data transfer of 1 Byte
 DMA deasserts DACK line
 Transfer count is decremented, memory address is incremented
 When transfer count = ZERO, data transfer is not complete,
DMAC waits for another DMA request from I/O
 If transfer count is not ZERO, data transfer is finished, DMAC
deasserts HOLD to tell processor , it does not need buses hereafter
 Processor then deasserts HLDA signal to tell DMAC that it has got
back control of the buses

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar


CS6303 – COMPUTER ARCHITECTURE

Demand Transfer Mode:-

Adv-Daisy Chaining Disadv-Daisy Chaining


Simpler and cheaper method Propagation delay of bus
grant signal is proportional to
number of masters in the
system. This makes
arbitration time slow,
therefore limited number of
masters are allowed in a
system
It requires the least number of Priority of master is fixed by
lines and independent of its physical location
number of masters in the
system
9. Explain in detail the Bus Arbitration techniques in DMA Failure of one master makes
 The device that is allowed to initiate data transfer on bus whole system failure
at any given time is called as BUS MASTER
 In a computer system, there may be more than one bus Polling Method:-
master such as processor, DMA controller, etc  Controller is used to generate addresses for masters
 They share system bus  Number of address lines required depends on number of
 When current Bus Master gives back bus control, masters connected in the system
another bus master gets the bus control.  If there are 8 masters in the system, at least three address
 Bus arbitration is defined as the process by which the lines needed
next device to become bus master is selected and bus  If anybody sends BUS request, controller generates a
mastership is transferred to it. sequence of master addresses
 Selection is done on priority basis  When requesting, master finds its address, activates BUSY
 There are two types of bus arbitration techniques in line signal.
DMA;-
Centralized arbitration technique
Distributed arbitration technique
Centralized arbitration Technique:-
 A single bus arbiter performs the arbitration
 The bus arbiter may be processor or a separate
controller
 Three types of centralized arbitration. They are:-
 Daisy chaining
 Polling Method
 Independent Request
Daisy Chaining:-
 It is a simpler and easy method Adv-Polling method
 All masters make use of same line for bus request Priority can be changed by changing polling sequence in
 In response to a bus request, controller sends a BUS the controller
GRANT signal if bus is free
 BUS GRANT signal serially propagates through each If one module fails, entire system does not fail
master until it encounters first one that is requesting access
to bus More improved than the daisy chaining method.
 This master blocks the propagation of the BUS GRANT
signal, activates BUSY LINE signal and gains control of bus
 Any other requesting module will not receive grant signal
and cannot get bus access
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

Independent priority method:-


 Each master has a separate pair of bus request and
BUS GRANT lines and each pair has a priority
assigned to it
 The built-in priority decoder within the controller selects
highest priority request and asserts corresponding BUS
GRANT signal.

Adv-Independent priority
Due to the separate pair of bus request and bus grant
signals, arbitration is fast
Arbitration is independent of number of masters in the
system
Disadv-Independent Priority
It requires more bus request and bus grant signals

Distributed arbitration:-
 All devices participate in selection of next bus master
Adv:-
 Each device on bus is assigned a 4bit ID
 It offers high reliability because operation of bus is not
 The number of bits in ID depends on number of devices dependent on any single device.
 When one or more devices request for bus control, they
assert START-ARBITRATION signal and place their 4bit ID 10.What are Interrupts? Explain the Interrupt hardware in
on arbitration lines, ARB0 to ARB3 detail with necessary diagrams.
 More than one device can place their 4bit ID to indicate that Interrupts:-
they need control of bus  An External event that affects the normal flow of instruction
 If one device puts 1 on bus line, another device puts 0 on execution generated by the external hardware devices such
same bus line, bus line status will be 0 as keyboard, mouse, etc is called as interrupts
 Device reads status of all lines through inverter buffers, so  Ex: computer should response to keyboard, mouse, etc
device reads bus status 0 as logic 1 when they ask something.
 Device having highest ID, has highest priority  If a device wants to tell processor about the completion of
 When two or more devices place their ID on bus lines, it is an operation, it sends a hardware signal,, that signal is
necessary to find highest ID from status of bus line called as Interrupt
 For example, consider two devices A and B having ID 1 and  A special Routine that is executed to give service to the
6, request for bus interrupts is called as Interrupt Service Routine (ISR)
 Device A puts bit pattern 0001, device B puts 0110  Interrupt request line is used to alert the processor
 With this combination, the bus line status will be 1000  A program can be interrupted in three ways:-
 Inverter buffers code seen by both devices is 0111  By external signal
 Each device compares code formed on arbitration lines to its  By a special instruction in the program
own ID, starting from MSB  By some other condition
 If it finds a difference at any bit position, it disables drives at Ex:-
that position by placing 0 at input of all these drives Main program ISR
 Here, device detects a difference on line ARB2 Instruction1:______ .
.
 It disables drives on lines ARB2, ARB1, ARB0 ;
.
 This makes code on arbitration lines to change to 0110  INTERRUPT OCCURS HERE .
 0110 6, which is ID of B ;
 This means, B wins the competition Instruction n:_______
“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

 An interrupt is caused by an external signal is called as  If a signal is closed, Flag 0 is used; INTR=1
hardware interrupt  Open collector and open drain gates are used
 Conditional interrupts (or) interrupts is created by special  Because o/p of open collector (or) an open drain gate is
instructions are called as software interrupts equivalent to a switch to ground that is open\
Interrupt Hardware:- Multi-Level interrupts:-
 An I/O device requests an interrupt by activating a bus line  Processor has more than one interrupt pins
called as interrupt request (or) request  I/O devices are tied to individual interrupt pins
 Interrupts are classified as Single level and Multilevel interrupt  They can be immediately classified by CPU upon receiving
Single level Interrupts:- an interrupt request from it
 This allows processor to go directly to that I/O device and
service it without polling concept
 This saves time in time processing input
 When a process interrupted, it stops executing its current
program, and calls special routine
 The event causes interruption is called as interrupt
 Processor finishes its current instruction; no cut-off
  Program counter’s current details stored in stack
 Remember during pgrm execution of an instruction
PC is loaded with address of ISR
  Interrupt programs continue working until result executed

 There can be many interrupting devices. But all interrupt


requests are made via a single input pin of CPU
 When interrupted, CPU has to poll the I/O ports to identify
requested device
 Polling software routine that checks the state of each device.
 Once the interrupting I/O port is found, CPU will service it and
then return to task it was performing before the interrupt
 Interrupt request from all devices are logically ORed and
connected tointerrupt input of processor
 The interrupt request from any device is routed to processor
interrupt input
 After getting interrupted, processor identifies requesting
device by reading interrupt status of each device

 All devices connected to INTR line via ground switches


 To request an interrupt, device closes its associated switch
 Interrupt signals I0 ….. In, interrupt request line will be equal
to VDD
 When a device requests for reply, voltage line drops to zero

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar
CS6303 – COMPUTER ARCHITECTURE

Enabling and Disabling interrupts:- 11. Write notes on I/O processor and explain its features with
 Maskable interrupts are enabled and disabled under a neat diagram.
program control  An I/O processor is aprocessor with DMA and interrupt
 By SET and RESET particular flip flops in processor, capability that reduces work load of CPU from communicating
interrupts can be masked or unmasked with I/O devices
 When masked, processor does not respond to interrupt  A computer system may have one CPU and one or more IOPs
even though interrupt is activated  An IOP that communicates with remote terminals over
 Most of the processors give masking facility communication lines and other communication media is
 In some kinds of processors, those inputs which can be called as data communication processor (DCP)
masked under software control are called as maskable  An IOP is not dependent on CPU
interrupts  It transfers data between external devices and memory under
 The interrupts that cannot be masked under software the control of I/O program
control are called as non-maskable interrupts  I/O program is initiated by CPU
Exceptions:-
 An interrupt is an event that suspends processing of
currently executing program and begins execution of
another program
 Many events can cause interrupts, called as exceptions
 An I/O interrupt is a subtype of exception
 Exceptions can be classified as: Faults, Traps (or) aborts
 Faults:-
 Faults are a type of exceptions that are detected and
services BEFORE the execution of the faulting instruction
 Ex: In VM, if page or segment referenced by processor is  Communication between IOP and device attached to it, is
not present, OS fetches that page from Disk, using fault similar to programmed I/O
exception routine.  IOP and memory communication is through DMA
 Traps:-  CPU send instructions to IOP to start or to test status of IOP
 Traps are exceptions that are reported immediately AFTER  When an I/O operation is desired, CPU informs IOP where to
the execution of instructions which causes the problem find I/O programs
 Ex: user defined interrupts such as Divide by Zero error  I/O programs contains instructions regarding to data transfer
 Aborts:-  The instructions in I/O program are prepared by system
 Aborts are exceptions which do not permit precise location programmers, called as “commands”
of the instruction causing the exception to be found  It is different from CPU instruction
 They are used to report severe errors such as hardware Features of IOP:-
error, illegal values in system.  An IOP can fetch and execute its own instructions
Debugging:-  Instructions are specially designed for I/O processing
 System software contains a system program called Debugger
 Intel 8089 IOP can perform arithmetic, logical operations,
 A debugger is a program that helps programmer to find and data transfer operations, searching, branching and translation
clear errors in a program  IOP does all work involved in I/O transfer including device set
 It uses two types of exceptions: Trace, Breakpoint up, programmed I/O, DMA
 To use trace exception, it is necessary to program the  IOP can transfer data from an 8bit source to 16bit destination
processor in trace mode  Communication between IOP and CPU is through memory
 If processor is in trace mode, an exception occurs after based control blocks; CPU defines tasks in control blocks to
execution of every instruction find a program sequence, called as channel program
 This is used to execute debug program as an exception  IOP supports multiprocessing; IOP and CPU can do
service routine processing at the same time.
 This exception service routine makes user to find the Intel 8089 IOP:-
contents of register, memory locations. Etc
 Trace exception is disabled during the execution of
debugging program
 A debugger allows programmer to set breakpoints at any
point in the program
 In this mode, the system executes instructions up to the
breakpoint and creates break point exception
 This exception routine allows to find contents of registers,
memory locations for checking process
 Programmer can verify whether his program is correct
until that point or not.

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar

You might also like