Unit 6 JSPSingh

Unit-6
Computer Arithmetic:
Addition and Subtraction Algorithm, Multiplication Algorithm
Introduction to Parallel Processing : Pipelining, Characteristics of multiprocessors, Interconnection
Structures, parallel processing
Latest technology and trends in computer architecture: next generation processors architecture,
microarchitecture, latest processor for smartphone or tablet and desktop
Content
´ Computer Arithmetic
´ Addition and Subtraction Algorithm
´ Multiplication Algorithm
´ Parallel processing
´ Characteristics of Multiprocessors
´ Interconnection Structures
´ Pipelining
Computer Arithmetic’s
Ø Arithmetic Instruction Manipulate data to produce results necessary for

the solution of computational problem
Ø Four Basic Arithmetic Operations are – Addition, Subtraction,

Multiplication and Division
ØAn arithmetic processor is the part of a processor unit that executes

arithmetic operations
ØArithmetic operations can be performed for following data types
•Fixed point binary data in signed magnitude representation

•Fixed point binary data in signed 2’s complement representation
• Floating Point binary data
• Binary coded decimal data
•The solution to any problem that is stated by a finite number
of well-defined procedural steps – Algorithm
Addition and Subtraction
Representation of both positive and negative numbers
- Following 3 representations
Signed magnitude representation
Signed 1's complement representation
Signed 2's complement representation
Example: Represent +9 and -9 in 7 bit-binary number
Only one way to represent + 9 ==> 0 001001
Three different ways to represent - 9:

In signed-magnitude: 1 001001
In signed-1's complement: 1 110110
In signed-2's complement: 1 110111
Fixed point numbers are represented either integer part only or fractional part only.
Addition and Subtraction
Addition and Subtraction with Signed Magnitude Data
Operation Add Subtract Magnitudes
Magnitudes
A>B A<B A=B
(+A)+(+B) +(A+B)
(+A)+(-B) +(A-B) -(B-A) +(A-B)
(-A)+(+B) -(A-B) +(B-A) +(A-B)
(-A)+(-B) -(A+B)
(+A)-(+B) +(A-B) -(B-A) +(A-B)
(+A)-(-B) +(A+B)
(-A)-(+B) -(A+B)
(-A)-(-B) -(A-B) +(B-A) +(A-B)

Computer Arithmetic’s
Ø Addition (subtraction) Algorithm
Ø When the sign of A and B are identical (different) , add the

magnitudes and attach the sign of A to the result.
ØWhen the signs of A and B are different (identical), compare the

magnitudes and subtract the smaller number from the larger.
Ø Choose the sign of result to be same as A if A>B
Ø or the complement of sign of A if A<B
Ø if A=B subtract B from A and make the sign of result positive
Hardware Implementation
Bs B Register
AV M (ModeControl)
Complementer
F
E Parallel Adder
Output Carry Input Carry
As A Register Load Sum
Simple procedure require magnitude comparator, an adder, two subtractor however alternative
reveals that using 2’s complement for operation requires only an adder and a complementor
M=0 output = A+B M=1 output = A+B’+1= A-B

Flow Chart for Add and Subtract Operation
Signed 2’s Complement Representation
Ø Addition :
Ø Addition of two numbers in signed 2’s complement form consists of

adding the numbers with the sign bits treated the same as the other
bits of the number. Carry out of sign bit is discarded
ØSum is obtained by adding the content of AC and BR (including the

sign bit). Overflow bit is set to 1 if EX-OR of last two carries if 1
ØSubtraction :
ØHere Subtraction consists of first taking the 2’s complement of the
subtrahend and then adding it to minuend
ØSubtraction done by adding the content of AC to 2’s Complement of

BR.
Signed 2’s Complement Representation
Content
´ Pipelining
Binary Multiplication
Sign of product determined from sign of Multiplicand and Multiplier , if same +ve else -ve
H/W implementation -Multiplication
For multiplication in hardware, process is slightly changed
1. In place of as Register equal to no. of multiplier bits, use

one adder and one Register
2. Instead of shifting multiplicand to left , partial product is

shifted right
3. When corresponding bit of multiplier is 0, no need to add

zero to partial product
H/W implementation -Multiplication
Flow Chart Binary Multiplication
Booth’s Multiplication Algorithm
ØUsed for multiplication in Signed 2’s complement representation
ØIt operates on the fact that strings of 0’s in the multiplier require no
addition but just shifting
ØAnd a string of 1’s in the multiplier from bit weight 2k to 2m can be

treated as 2k+1 – 2m
ØE.g. +14 (001110 ) here k=3, m=1 represented as 24-21 = 16-2 = 14
Thus M x 14= M x 24 – M x 21
ØUsed for multiplication in Signed 2’s complement representation
ØPrior to shifting the multiplicand can be added to the partial product ,

subtracted from the partial product or left unchanged according to the
following rules :
1. The multiplicand is subtracted from partial product upon encountering

the first least significant 1 in the string of 1’s in the multiplier
2. The multiplicand is added to the partial product upon encountering the

first 0 (provided that there was a previous 1) in the string of 0’s in the
multiplier.
3. The partial product does not change when the multiplier bit is identical
to the previous multiplier bit
Solve using Booth’s Algorithm
−9 × −13
´ BR register (5-bits)
´ -9 = 1 1001
´ 1’s Compl. = 1 0110
´ 2’s Compl. = 1 0111
´ BR’ + 1 = 0 1001
´ QR register (5-bits)
´ -13 = 1 1101
´ 1’s Compl. = 1 0010
´ 2’s Compl. = 1 0011
Example - Booth’s Multiplication Algorithm
Content
´ Pipelining
Parallel Processing
´ Execution of Concurrent Events in the computing process to achieve faster Computational

Speed – Parallel Processing
´ The purpose of parallel processing is to speed up the computer processing capability and
increase its throughput, i.e. the amount of processing that can be accomplished during a
given interval of time
´ The amount of hardware increases with parallel processing – increases the cost
´ Level of Parallel Processing
´ Job or Program level
´ Task or Procedure level
´ Inter-Instruction level
´ Intra-Instruction level
´ Lowest level: Shift Register, Register with parallel load
´ Higher level: Multiplicity of functional unit that perform identical /different task
Multiple Functional Units
Parallel Computers
Architectural Classification
´ Flynn's classification
´ Based on the multiplicity of Instruction Streams and Data Streams
´ Instruction Stream
´ Sequence of Instructions read from memory
´ Data Stream
´ Operations performed on the data in the processor
Number of Data Streams

Single Multiple
Number of Single SISD SIMD

Instruction
Streams Multiple MISD MIMD
SISD Processors
´ Characteristics
´ Uni-processor machine, capable of executing single instructions, operating on a single
data stream
´ Single computer containing a control unit, processor and memory unit
´ Instructions and data are stored in memory and executed sequentially
´ may or may not have parallel processing
´ parallel processing can be achieved by pipelining
Control Processor Data stream Memory

Unit Unit
Instruction stream
SIMD Processors
´ Characteristics
´ Capable of executing the same instruction on all the processors but operating on different data
streams
´ Only one copy of the program exists
´ A single controller executes one instruction at a time
Memory
Data bus
Control Unit
Instruction stream
P P ••• P Processor units

Data stream
Alignment network
M M ••• M Memory modules

MISD Processors
´ Characteristics
´ Capable of executing the different instructions on different processors but all of them operating
on same dataset
´ Such machines are not practically useful
M CU P
M CU P
Memory
• •
• •
• •
M CU P Data stream
Instruction stream
MIMD Processors
´ Characteristics
´ Capable of executing the multiple instructions on multiple datasets
´ Capable of processing several programs simultaneously
Interconnection Structure
´ The components that form a multiprocessor system are CPUs, IOPs connected to input-
output devices, and a memory unit
´ The interconnection between the components can have different physical
configurations, depending on the number of transfer paths that are available
´ Between the processors and memory in a shared memory system
´ Among the processing elements in a loosely coupled system
´ There are several physical forms available for establishing an interconnection network.
´ Time-Shared common bus
´ Multiport Memory
´ Crossbar Switch
´ Multistage Switching Network
´ Hypercube System
Time-Shared Common Bus
´ A common-bus multiprocessor system consists of a number of processors connected
through a common path to a memory unit
´ Only one processor can communicate with the memory or another processor at any given
time.
´ As a consequence, the total overall transfer rate within the system is limited by the speed of the
single path
´ Part of the local memory may be designed as a cache memory attached to the CPU
Multiport Memory
´ A multiport memory system employs separate buses between each memory module
and each CPU
´ The module must have internal control logic to determine which port will have access to
memory at any given time
´ Memory access conflicts are resolved by assigning fixed priorities to each memory port.
´ Advantage: The high transfer rate can be achieved because of the multiple paths.
´ Disadvantage: Requires expensive memory control logic and a large number of cables
and connections
Crossbar Switch
´ Consists of a number of cross-points that are placed at intersections between processor
buses and memory module paths
´ The small square in each cross-point is a switch that determines the path from a
processor to a memory module
´ Advantage: Supports simultaneous transfers from all memory modules
´ Disadvantage: The hardware required to implement the switch can become quite large
and complex
MM1 MM2 MM3 MM4
CPU1
CPU2
CPU3
CPU4
Multistage Switching Network
Pipelining
´ A technique of decomposing a sequential process into sub operations, with each sub
process being executed in a special dedicated segment that operates concurrently
with all other segments.
´ It is the characteristic of pipelining that several computations can be in progress in

distinct segments at the same time.
´ Each segment performs partial processing dictated by the way the task is dictated
´ The result obtained from computation is in each segment is transferred to next segment
in the pipeline
´ The final result is obtained after data has been passed through all segment
Design of Basic Pipeline
´ In a pipelined processor, a pipeline has two ends, the input end and the output end
´ Between these ends, there are multiple stages/segments such that output of one stage
is connected to input of next stage and each stage performs a specific operation
´ Interface registers are used to hold the intermediate output between two stages, these
interface registers are also called latch or buffer.
´ All the stages in the pipeline along with the interface registers are controlled by a
common clock
Design of Basic Pipeline
´ Pipeline Stages
´ RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set.
´ Following are the 5 stages of RISC pipeline with their respective operations
´ Stage 1 (Instruction Fetch)
´ In this stage the CPU reads instructions from the address in the memory whose value is present in the program counter
´ Stage 2 (Instruction Decode)
´ In this stage, instruction is decoded and the register file is accessed to get the values from the registers used in the instruction
´ Stage 3 (Instruction Execute)
´ In this stage, ALU operations are performed
´ Stage 4 (Memory Access)
´ In this stage, memory operands are read and written from/to the memory that is present in the instruction
´ Stage 5 (Write Back)
´ In this stage, computed/fetched value is written back to the register present in the instruction
Pipelining
´ Simplest way to understand pipelining is to imagine that each segment consist of input
register followed by combinational circuit. The output of combinational circuit in a
segment is applied to Input register of next segment
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2
Multiplier
Segment 2
R3 R4
Adder
Segment 3
R5
R1 ¬ Ai, R2 ¬ Bi Load Ai and Bi

R3 ¬ R1 * R2, R4 ¬ Ci Multiply and load Ci
R5 ¬ R3 + R4 Add
Operation of the Pipeline Stage
Clock Pulse Segment 1 Segment 2 Segment 3

Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B6 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7
General Pipeline
General Structure of a 4-Segment Pipeline
Clock
Input S 1 R 1 S 2 R 2 S 3 R 3 S 4 R 4
Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipeline Speed-Up
n: Number of tasks to be performed
´ Conventional Machine (Non-Pipelined)

´ tn: Clock cycle (time to complete each task)
´ t1: Time required to complete the ‘n’ tasks
´ t1 = n * tn
´ Pipelined Machine (k stages)

´ tp: Clock cycle (time to complete each suboperation)
´ tk: Time required to complete the n tasks
´ tk = (k + n - 1) * tp
´ Speedup
´ Sk: Speedup
´ Sk = n*tn / (k + n - 1)*tp
Pipeline Speed-Up
As n becomes very larger than k-1 then k+n-1 approaches to n
´ Then : S= tn/tp
If we consider time taken to complete a task is same in both circuits then tn=ktp and speedup reduces to
´ S= ktp/tn = k
´ i.e. maximum theoretical speedup pipeline can provide is k

Unit 6 JSPSingh

Uploaded by

Copyright:

Available Formats

You might also like

Unit 6 JSPSingh

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 6 JSPSingh

Uploaded by

Copyright:

Available Formats

Unit-6

´ Addition and Subtraction Algorithm

Ø Arithmetic Instruction Manipulate data to produce results necessary for

Ø Four Basic Arithmetic Operations are – Addition, Subtraction,

ØAn arithmetic processor is the part of a processor unit that executes

ØArithmetic operations can be performed for following data types

•Fixed point binary data in signed magnitude representation

Example: Represent +9 and -9 in 7 bit-binary number

Only one way to represent + 9 ==> 0 001001

Three different ways to represent - 9:

(+A)+(-B) +(A-B) -(B-A) +(A-B)

(-A)+(+B) -(A-B) +(B-A) +(A-B)

(+A)-(+B) +(A-B) -(B-A) +(A-B)

(-A)-(-B) -(A-B) +(B-A) +(A-B)

Ø Addition (subtraction) Algorithm

Ø When the sign of A and B are identical (different) , add the

ØWhen the signs of A and B are different (identical), compare the

As A Register Load Sum

M=0 output = A+B M=1 output = A+B’+1= A-B

Ø Addition of two numbers in signed 2’s complement form consists of

ØSum is obtained by adding the content of AC and BR (including the

ØSubtraction done by adding the content of AC to 2’s Complement of

´ Addition and Subtraction Algorithm

For multiplication in hardware, process is slightly changed

1. In place of as Register equal to no. of multiplier bits, use

2. Instead of shifting multiplicand to left , partial product is

3. When corresponding bit of multiplier is 0, no need to add

ØUsed for multiplication in Signed 2’s complement representation

ØAnd a string of 1’s in the multiplier from bit weight 2k to 2m can be

ØE.g. +14 (001110 ) here k=3, m=1 represented as 24-21 = 16-2 = 14

ØUsed for multiplication in Signed 2’s complement representation

ØPrior to shifting the multiplicand can be added to the partial product ,

1. The multiplicand is subtracted from partial product upon encountering

2. The multiplicand is added to the partial product upon encountering the

´ Addition and Subtraction Algorithm

´ Execution of Concurrent Events in the computing process to achieve faster Computational

Number of Data Streams

Number of Single SISD SIMD

Control Processor Data stream Memory

P P ••• P Processor units

M M ••• M Memory modules

´ It is the characteristic of pipelining that several computations can be in progress in

´ Stage 1 (Instruction Fetch)

´ Stage 2 (Instruction Decode)

´ Stage 3 (Instruction Execute)

´ In this stage, ALU operations are performed

´ Stage 4 (Memory Access)

´ Stage 5 (Write Back)

R1 ¬ Ai, R2 ¬ Bi Load Ai and Bi

Clock Pulse Segment 1 Segment 2 Segment 3

´ Conventional Machine (Non-Pipelined)

´ Pipelined Machine (k stages)

´ i.e. maximum theoretical speedup pipeline can provide is k

You might also like