Unit 6 JSPSingh

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Unit-6

Computer Arithmetic:
Addition and Subtraction Algorithm, Multiplication Algorithm
Introduction to Parallel Processing : Pipelining, Characteristics of multiprocessors, Interconnection
Structures, parallel processing
Latest technology and trends in computer architecture: next generation processors architecture,
microarchitecture, latest processor for smartphone or tablet and desktop
Content
´ Computer Arithmetic

´ Addition and Subtraction Algorithm

´ Multiplication Algorithm

´ Parallel processing

´ Characteristics of Multiprocessors

´ Interconnection Structures

´ Pipelining
Computer Arithmetic’s

Ø Arithmetic Instruction Manipulate data to produce results necessary for


the solution of computational problem

Ø Four Basic Arithmetic Operations are – Addition, Subtraction,


Multiplication and Division

ØAn arithmetic processor is the part of a processor unit that executes


arithmetic operations

ØArithmetic operations can be performed for following data types

•Fixed point binary data in signed magnitude representation


•Fixed point binary data in signed 2’s complement representation
• Floating Point binary data
• Binary coded decimal data
•The solution to any problem that is stated by a finite number
of well-defined procedural steps – Algorithm
Addition and Subtraction
Representation of both positive and negative numbers

- Following 3 representations
Signed magnitude representation
Signed 1's complement representation
Signed 2's complement representation

Example: Represent +9 and -9 in 7 bit-binary number

Only one way to represent + 9 ==> 0 001001

Three different ways to represent - 9:


In signed-magnitude: 1 001001
In signed-1's complement: 1 110110
In signed-2's complement: 1 110111
Fixed point numbers are represented either integer part only or fractional part only.
Addition and Subtraction
Addition and Subtraction with Signed Magnitude Data
Operation Add Subtract Magnitudes
Magnitudes
A>B A<B A=B
(+A)+(+B) +(A+B)

(+A)+(-B) +(A-B) -(B-A) +(A-B)

(-A)+(+B) -(A-B) +(B-A) +(A-B)

(-A)+(-B) -(A+B)

(+A)-(+B) +(A-B) -(B-A) +(A-B)

(+A)-(-B) +(A+B)

(-A)-(+B) -(A+B)

(-A)-(-B) -(A-B) +(B-A) +(A-B)


Computer Arithmetic’s

Ø Addition (subtraction) Algorithm

Ø When the sign of A and B are identical (different) , add the


magnitudes and attach the sign of A to the result.

ØWhen the signs of A and B are different (identical), compare the


magnitudes and subtract the smaller number from the larger.
Ø Choose the sign of result to be same as A if A>B
Ø or the complement of sign of A if A<B
Ø if A=B subtract B from A and make the sign of result positive
Hardware Implementation

Bs B Register

AV M (ModeControl)
Complementer
F

E Parallel Adder
Output Carry Input Carry

As A Register Load Sum

Simple procedure require magnitude comparator, an adder, two subtractor however alternative
reveals that using 2’s complement for operation requires only an adder and a complementor

M=0 output = A+B M=1 output = A+B’+1= A-B


Flow Chart for Add and Subtract Operation
Signed 2’s Complement Representation

Ø Addition :

Ø Addition of two numbers in signed 2’s complement form consists of


adding the numbers with the sign bits treated the same as the other
bits of the number. Carry out of sign bit is discarded

ØSum is obtained by adding the content of AC and BR (including the


sign bit). Overflow bit is set to 1 if EX-OR of last two carries if 1

ØSubtraction :
ØHere Subtraction consists of first taking the 2’s complement of the
subtrahend and then adding it to minuend

ØSubtraction done by adding the content of AC to 2’s Complement of


BR.
Signed 2’s Complement Representation
Content
´ Computer Arithmetic

´ Addition and Subtraction Algorithm

´ Multiplication Algorithm

´ Parallel processing

´ Characteristics of Multiprocessors

´ Interconnection Structures

´ Pipelining
Binary Multiplication

Sign of product determined from sign of Multiplicand and Multiplier , if same +ve else -ve
H/W implementation -Multiplication

For multiplication in hardware, process is slightly changed

1. In place of as Register equal to no. of multiplier bits, use


one adder and one Register

2. Instead of shifting multiplicand to left , partial product is


shifted right

3. When corresponding bit of multiplier is 0, no need to add


zero to partial product
H/W implementation -Multiplication
Flow Chart Binary Multiplication
Booth’s Multiplication Algorithm

ØUsed for multiplication in Signed 2’s complement representation

ØIt operates on the fact that strings of 0’s in the multiplier require no
addition but just shifting

ØAnd a string of 1’s in the multiplier from bit weight 2k to 2m can be


treated as 2k+1 – 2m

ØE.g. +14 (001110 ) here k=3, m=1 represented as 24-21 = 16-2 = 14

Thus M x 14= M x 24 – M x 21
Booth’s Multiplication Algorithm

ØUsed for multiplication in Signed 2’s complement representation

ØPrior to shifting the multiplicand can be added to the partial product ,


subtracted from the partial product or left unchanged according to the
following rules :

1. The multiplicand is subtracted from partial product upon encountering


the first least significant 1 in the string of 1’s in the multiplier

2. The multiplicand is added to the partial product upon encountering the


first 0 (provided that there was a previous 1) in the string of 0’s in the
multiplier.

3. The partial product does not change when the multiplier bit is identical
to the previous multiplier bit
Booth’s Multiplication Algorithm
Solve using Booth’s Algorithm
−9 × −13

´ BR register (5-bits)
´ -9 = 1 1001
´ 1’s Compl. = 1 0110
´ 2’s Compl. = 1 0111
´ BR’ + 1 = 0 1001

´ QR register (5-bits)
´ -13 = 1 1101
´ 1’s Compl. = 1 0010
´ 2’s Compl. = 1 0011
Example - Booth’s Multiplication Algorithm
Content
´ Computer Arithmetic

´ Addition and Subtraction Algorithm

´ Multiplication Algorithm

´ Parallel processing

´ Characteristics of Multiprocessors

´ Interconnection Structures

´ Pipelining
Parallel Processing

´ Execution of Concurrent Events in the computing process to achieve faster Computational


Speed – Parallel Processing
´ The purpose of parallel processing is to speed up the computer processing capability and
increase its throughput, i.e. the amount of processing that can be accomplished during a
given interval of time
´ The amount of hardware increases with parallel processing – increases the cost
´ Level of Parallel Processing
´ Job or Program level
´ Task or Procedure level
´ Inter-Instruction level
´ Intra-Instruction level
´ Lowest level: Shift Register, Register with parallel load
´ Higher level: Multiplicity of functional unit that perform identical /different task
Multiple Functional Units
Parallel Computers

Architectural Classification
´ Flynn's classification
´ Based on the multiplicity of Instruction Streams and Data Streams
´ Instruction Stream
´ Sequence of Instructions read from memory

´ Data Stream
´ Operations performed on the data in the processor

Number of Data Streams


Single Multiple

Number of Single SISD SIMD


Instruction
Streams Multiple MISD MIMD
SISD Processors

´ Characteristics
´ Uni-processor machine, capable of executing single instructions, operating on a single
data stream
´ Single computer containing a control unit, processor and memory unit
´ Instructions and data are stored in memory and executed sequentially
´ may or may not have parallel processing
´ parallel processing can be achieved by pipelining

Control Processor Data stream Memory


Unit Unit

Instruction stream
SIMD Processors
´ Characteristics
´ Capable of executing the same instruction on all the processors but operating on different data
streams
´ Only one copy of the program exists
´ A single controller executes one instruction at a time
Memory
Data bus

Control Unit
Instruction stream

P P ••• P Processor units


Data stream

Alignment network

M M ••• M Memory modules


MISD Processors
´ Characteristics
´ Capable of executing the different instructions on different processors but all of them operating
on same dataset
´ Such machines are not practically useful

M CU P

M CU P
Memory
• •
• •
• •

M CU P Data stream

Instruction stream
MIMD Processors
´ Characteristics
´ Capable of executing the multiple instructions on multiple datasets
´ Capable of processing several programs simultaneously
Interconnection Structure
´ The components that form a multiprocessor system are CPUs, IOPs connected to input-
output devices, and a memory unit
´ The interconnection between the components can have different physical
configurations, depending on the number of transfer paths that are available
´ Between the processors and memory in a shared memory system
´ Among the processing elements in a loosely coupled system

´ There are several physical forms available for establishing an interconnection network.
´ Time-Shared common bus
´ Multiport Memory
´ Crossbar Switch
´ Multistage Switching Network
´ Hypercube System
Time-Shared Common Bus
´ A common-bus multiprocessor system consists of a number of processors connected
through a common path to a memory unit
´ Only one processor can communicate with the memory or another processor at any given
time.
´ As a consequence, the total overall transfer rate within the system is limited by the speed of the
single path
´ Part of the local memory may be designed as a cache memory attached to the CPU
Multiport Memory
´ A multiport memory system employs separate buses between each memory module
and each CPU
´ The module must have internal control logic to determine which port will have access to
memory at any given time
´ Memory access conflicts are resolved by assigning fixed priorities to each memory port.
´ Advantage: The high transfer rate can be achieved because of the multiple paths.
´ Disadvantage: Requires expensive memory control logic and a large number of cables
and connections
Crossbar Switch
´ Consists of a number of cross-points that are placed at intersections between processor
buses and memory module paths
´ The small square in each cross-point is a switch that determines the path from a
processor to a memory module
´ Advantage: Supports simultaneous transfers from all memory modules
´ Disadvantage: The hardware required to implement the switch can become quite large
and complex
MM1 MM2 MM3 MM4

CPU1

CPU2

CPU3

CPU4
Multistage Switching Network
Pipelining
´ A technique of decomposing a sequential process into sub operations, with each sub
process being executed in a special dedicated segment that operates concurrently
with all other segments.

´ It is the characteristic of pipelining that several computations can be in progress in


distinct segments at the same time.

´ Each segment performs partial processing dictated by the way the task is dictated

´ The result obtained from computation is in each segment is transferred to next segment
in the pipeline

´ The final result is obtained after data has been passed through all segment
Design of Basic Pipeline

´ In a pipelined processor, a pipeline has two ends, the input end and the output end

´ Between these ends, there are multiple stages/segments such that output of one stage
is connected to input of next stage and each stage performs a specific operation

´ Interface registers are used to hold the intermediate output between two stages, these
interface registers are also called latch or buffer.

´ All the stages in the pipeline along with the interface registers are controlled by a
common clock
Design of Basic Pipeline
´ Pipeline Stages

´ RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set.

´ Following are the 5 stages of RISC pipeline with their respective operations

´ Stage 1 (Instruction Fetch)

´ In this stage the CPU reads instructions from the address in the memory whose value is present in the program counter

´ Stage 2 (Instruction Decode)

´ In this stage, instruction is decoded and the register file is accessed to get the values from the registers used in the instruction

´ Stage 3 (Instruction Execute)

´ In this stage, ALU operations are performed

´ Stage 4 (Memory Access)

´ In this stage, memory operands are read and written from/to the memory that is present in the instruction

´ Stage 5 (Write Back)

´ In this stage, computed/fetched value is written back to the register present in the instruction
Pipelining
´ Simplest way to understand pipelining is to imagine that each segment consist of input
register followed by combinational circuit. The output of combinational circuit in a
segment is applied to Input register of next segment

Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2

Multiplier
Segment 2
R3 R4

Adder
Segment 3

R5

R1 ¬ Ai, R2 ¬ Bi Load Ai and Bi


R3 ¬ R1 * R2, R4 ¬ Ci Multiply and load Ci
R5 ¬ R3 + R4 Add
Operation of the Pipeline Stage

Clock Pulse Segment 1 Segment 2 Segment 3


Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B6 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7
General Pipeline
General Structure of a 4-Segment Pipeline

Clock

Input S 1 R 1 S 2 R 2 S 3 R 3 S 4 R 4

Space-Time Diagram

1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6

2 T1 T2 T3 T4 T5 T6

3 T1 T2 T3 T4 T5 T6

4 T1 T2 T3 T4 T5 T6
Pipeline Speed-Up
n: Number of tasks to be performed

´ Conventional Machine (Non-Pipelined)


´ tn: Clock cycle (time to complete each task)
´ t1: Time required to complete the ‘n’ tasks
´ t1 = n * tn

´ Pipelined Machine (k stages)


´ tp: Clock cycle (time to complete each suboperation)
´ tk: Time required to complete the n tasks
´ tk = (k + n - 1) * tp

´ Speedup
´ Sk: Speedup

´ Sk = n*tn / (k + n - 1)*tp
Pipeline Speed-Up
As n becomes very larger than k-1 then k+n-1 approaches to n

´ Then : S= tn/tp

If we consider time taken to complete a task is same in both circuits then tn=ktp and speedup reduces to

´ S= ktp/tn = k

´ i.e. maximum theoretical speedup pipeline can provide is k

You might also like