Parallel Processing

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 33

PARALLEL PROCESSING

 Parallel processing is a term used for a large class of technique


that are used to provide simultaneous data-processing tasks for
the purpose of increasing the computational speed of a
computer system.
 It refers to techniques that are used to provide simultaneous
data processing.
 The system may have two or more ALUs to be able to execute
two or more instruction at the same time.
 The system may have two or more processors operating
concurrently.
 It can be achieved by having multiple functional units that
perform same or different operation simultaneously.

Example of parallel processing:


- Multiple functional unit:

Separate the execution unit eight functional units operating in


parallel.

 There are variety of ways in which the parallel processing can be


classified
o Internal organization of processor
o Interconnection structure between processors
o Flow of information through system
Architectural classification:

 Flynn’s classification
 Based on the multiple of instruction streams and data streams
 Instruction stream
Sequence of instructions read from memory
 Data stream
Operations performed on the data in the processor

 SISD represents the organization containing single control unit,


a processor unit and a memory unit. Instructions are executed
sequentially and system may or may not have internal parallel
processing capabilities.
 SIMD represents an organization that includes many processing
units under the supervision of a common control unit.
 MISD structure is of only theoretical interest since no practical
system has been constructed using the organization.
 MIMD organization refers to a computer system capable of
processing several programs at the same time.

The main difference between multicomputer system and


multiprocessor system is that the multiprocessor system is controlled
by one operating system that provides interaction between processors
and all component of the system cooperate in the solution of a
problem.

 Parallel processing can be discussed under following topics:


a) Pipeline processing
b) Vector processing
c) Array processing

Que1. Explain pipelining with example. And also explain arithmetic


pipelining with example.

Ans.

 A technique of decomposing a sequential process into sub


operations, with each sub process being executed in a special
dedicated segment that operates concurrently with all other
segments.
 Each segment performs partial processing dedicated by the way
task is partitioned.
 The result obtained from each segment is transferred to next
segment.
 Suppose we have to perform the following task:
Each sub operations is to be performed in a segment within a
pipeline. Each segment has one or more registers and a
combinational circuit.
Ai * Bi + Ci for I = 1,2,3,………,7
As shown in table, the first clock pulse transfers X1 and Y1 into
R1 and R2. The second clock pulse does the following three sub
operations:
The second clock pulse does the following three sub operations:
a) Transfers the product of R1 and R2 into R3,
b) Transfers Z1 to R4, and
c) Transfers X2 and Y2 into R1 and R2 respectively.
The third clock pulse performs the following four sub
operations:
a) Transfers the product of R1 and R2 into R3 respectively.
b) Transfers Z2 to R4,
c) Transfers X3 and Y3 into R1 and R2, and
d) Transfers the addition of R3 and R4 into R5
Hence each clock produces a new output and moves the
data one step down the pipeline.
Types of pipelining:

 Arithmetic pipelining
 Instruction pipeline
 ARITHMETIC PIPELINE
 Pipeline arithmetic units are usually found in very high speed
computers.
 They are used to implement floating point operations.
 We will now discuss the pipeline unit for the floating point
addition and subtraction.
 The inputs to floating point adder pipeline are two normalized
floating point numbers,
 A and B are mantissas and a and b are the exponents.
 The floating point addition and subtraction can be performed in
four segments:
Floating point adder
1) Compare the exponent
2) Align the mantissa
3) Add/sub the mantissa
4) Normalize the result

X = A *10a = 0.9504 *103


Y = B * 10b = 0.8200 * 102
1) Compare exponents:
3 – 2 =1
2) Align mantissa
X = 0.9504 * 103
Y = 0.8200 *103
3) Add mantissa
Z = 1.0324 * 103
4) Normalize result
Z = 0.10324 * 104
PIPELINE FOR FLOATING POINT ADDITION AND SUBTRACTION

INSTRUCTION PIPELINE:

 Pipeline processing can occur not only in the data stream but in
the instruction stream as well.
 An instruction pipeline reads consecutive instruction from
memory while previous instruction are being executed in other
segments
 This caused the instruction fetch and executes segments to
overlap and perform simultaneous operation.

Four segments CPU pipeline:


 FI segment fetch the instruction.
 DA segment decodes the instruction and calculate the
effective address.
 FO segment fetches the operand.
 EX segment executes the instruction

EXAMPLE OF FOUR SEGMENT INSTRUCTION PIPELINE

According to this figure while an instruction is being executed in the


segment 4, at the same time, the next instruction is busy in fetching an
operand from the memory in segment 3. Thirds instruction is busy in
calculating the effective address in segment 2. And the segment 1 will
be busy in fetching the instruction from the memory. Thus, these sub
operations in the instruction cycle can overlap and can be used for
processing at the same time.

VECTOR PROCESSOR

 Vector processors are SISD processors which include in their


instruction set instructions operating on vectors. They are
implemented using pipelined functional units.
 A vector unit typically consists of pipelined functional units and
vector registers.
 Vector processors are not parallel processors, there are not
several CPUs running in parallel.
 Vector computers usually have vector registers which can store
each 64 up to 128 words.
 Vector processors include in their instruction set, beside scalar
instruction, also instructions operating on vectors.
 Vector instructions:
- Load vectors from memory into vector register
- Store vector into memory
- Arithmetic and logic operations between vectors
- Operations between vectors and scalars etc.

Vector processing

Applications of vector processing:

There are many applications areas where vector processing is of the


important or in demand. Some of the applications are:

 Artificial intelligence and expert systems


 Image processing
 Medical diagnosis
 Mapping the human genome
 Long-range weather forecasting
 Seismic data analysis
 Petroleum explorations.

Vector operations:

 Arithmetic operations on large arrays of numbers


 Conventional scalar processors

Let us consider the program in assembly language that two vectors A


and B of length 100 and put the result in vector c.

Initialize I =0

20 Read A (I)

Read B (I)

Store c (I) = A (I) + B (I)

Increment I = I+1

If I<=100 goto 20

Continue

Single vector instruction:

C (1:100) = A(1:100) + B (1:100)

ARRAY PROCESSORS (OR ARRAY PROCESSING)

an array processors is a processors that performs computations on


large array of data. The term is used to refer two different types of
processors.
1) Attached array process
2) SIMD array processor

 Attached array process: an attached array processor is an


auxiliary processor attached to a general purpose computer.
It is intended to improve the performance of the host
computer in specific numerical computation tasks. The array
processors can be programmed by the user to accommodate a
variety of complex arithmetic problems.

Shoe this figure the interconnection of an attached array processor to


a host computer.

b) SIMD array processor: SIMD is the organization of a single


computer containing multiple processors operating in parallel. A
general block diagram of array processors is shown below. It contains
a set of identical processing elements (PE’s), each processor element
includes an ALU and registers. The best known SIMD array processor
is the ILLIAC IV computer developed by the Burroughs crops. SIMD
processor is highly specialized computers. They are only suitable for
numerical problems that can be expressed in vector or matrix form
and they are not suitable for other types of computations.

DATA TRANSFER AND MANIPULATION

Most computer instruction can be classified into three categories:

a) data transfer instruction

b) Data manipulation instruction

c) Program control instruction

Data transfer instruction

Data transfer instruction moves data from one place in the computer to
another without changing the data content.
The most common data transfers are between memory and processor
registers, between processor registers and input or output, and
between the processor register themselves

Data manipulation instruction

Data manipulation instruction performs operations on data and


provides the computational capabilities for the computer.

It is divided into three basic types:

a) Arithmetic
b) Logical and bit manipulation
c) Shift instructions
Program control instruction

Program control instruction specify conditions for altering the content


of the program counter, while data transfer and manipulation
instructions specify conditions for data- processing operations.
Que. What is meant by CPU organization? Discuss the different types
of CPU organization.

Ans. The CPU is the brain of the computer. CPU refers the centralization
between input and output unit, all major calculations are performed in
CPU. It is also responsible for activation and controlling the operations
of other units. A CPU is made up three major parts.

Register set: the register set holds the instruction and data (both
intermediate and final) which are used in during the execution of the
instruction.

ALU: ALU performs the four basic arithmetic micro-operations (add,


subtract, multiply, and divide) and logic operations or comparisons (is
less than, is equal to and is greater than).
The control unit: it supervises the transfer of information among the
register and instructs the ALU as to which operations to perform.

CPU or processor can be organized in a number of ways usually the


organization used in CPU is any of the following:

Accumulator organization: in this type of CPU organization all


operations are performed using an implied accumulator register.

In this type of organization, the instruction has one instruction field.

For example: the instruction for arithmetic addition can be defined in


assembly language as:
ADD X.

in the above instruction x is the address of the operand, the above


instruction result in the operation.

AC  AC +M[X]

Here AC symbolized the accumulator register and M[X] means the


memory word stored at the address X.

General register organization

The number of registers is used in this organization are:

One of the CPU registers is called as an accumulator (AC) or (A)


register. It is the main operator register of the ALU.

The data register (DR) as a buffer between the CPU and main
memory. it is used as an input operand register with the accumulator.

The instruction register holds the operation hold of the current


instruction.

The address registers (AR) holds the address of the memory in


which the operand resides.

The program counter holds the address of the next instruction to be


fetched for execution.

In this type of CPU organization, the instruction format need three


register address field. Thus the instruction for arithmetic addition is
defined in assembly language as:

ADD R1, R2, R3


In the above instruction R1, R2 and R3 symbolized three different CPU
registers where the different CPU information is stored. The above
instruction results in the operation.

R1  R2 +R3

If the two destination register

ADD R1, R2

The above instruction result in the program

R1  R1 +R2

General registers type computers employ two or three address field in


the instruction format

ADD R2,X

The above instruction result in the operation

R3  R3 +M[X]
Stack organization: stack- LIFO (least in first out) mechanism it is a
list of data elements usually bytes or words, with access restrictions
that the element can be added/deleted is from the top of the list.

Stack is a part of register unit and memory unit with a register that
holds the address of the stack.

The part of register array or memory used for stack is called stack
area.

Stack pointer register used to hold the address of the stack. It


always point at the top item of the stack.

Stack implementation using two ways:

Register stack
Memory stack
Register stack:

It is organized as a collection of finite number of CPU registers.


Stack point as holds the address of the register that is currently the
top of the stack.

In the above figure, 4 elements are stored in stack.


Data element 3 is top of the stack therefore content of SP =4
Stack pointer is a 5 bit register because there are 32 words register
in the stack.

Sp =0 means stack is empty


When data elements are pushed on the stack SP is incremented.
Memory stack:

It is implemented using computer memory.


Operation of memory stack is similar to register stack.
Using memory, size of stack can be extended up to size of memory.
Operation of memory stack is slower than that of register stack
because register stack is internal to the CPU and does not need any
memory access.
Que. Explain booth algorithms in detail with flow chart.

Ans. . Booth algorithm provides steps to perform multiplication using


binary numbers. Using this algorithm there will be no need to convert
the final result into its 2’s complement form, even if the sign of the
multiplier and the multiplicand are not same.

The algorithm works for positive or negative multipliers in 2’s


complement representation. The hardware implementation of booth
algorithm requires the register configuration show in figure.

Here BR, AC, and QR are registers. On designates the least significant
bit of the multiplier in register QR. An extra flip flop Qn+1 is appended
to QR to facilitate a double bit inspection of the multiplier. The flow
chart of booth algorithm is show:

PROCEDURE:

1) Let M is the multiplicand


2) Let Q is the multiplier
3) Consider a 1-bit register Q-1 and initialize it to 0
4) Consider a register A and initialize it to 0

CONDITIONS:

1) If Q0 Q-1 is same i.e. 00 or 11 then, perform arithmetic right shift by


1 bit.
2) If Q0 Q-1 = 10 then perform A A-M
And then perform arithmetic right shift.
3) If Q0 Q-1 = 01 then perform A A+M and then perform arithmetic
right shift.
Flow chart of booth algorithm,

START

Count = 0

M – Multiplicand

Q - Multiplier

Qn Qn+1
10 01

A=A-M A=A+M

Arithmetic Shift
right

Count = count - 1

NO Count = 0 YES END


Que. How the floating point numbers are stored in computer? Explain
normalization in relation to floating point and how the overflow and
underflow condition occurs in normalized floating point operations?

Ans. Representation of floating point (real) numbers

Floating point numbers are represented in computer in the same way.


Similar to integer numbers, floating point numbers may also be positive
or negative. There are two method of specifying the position of the
binary point in a CPU register first is fixed point representation and
second is floating point representation. Let us see both of them.

Fixed point representation:

The fixed point method assumes that the binary point in a negative
number is always fixed in one position. The two positions are widely
used:

 A binary point in the extreme left of the register to make the stored
number a fraction and,
 A binary point in the extreme right of the register to make the
stored number an integer.

Floating point representation


the floating point representation of a number has two parts. The first
part represents a signed, fixed point number called the mantissa.the
second part denote the position of the decimal point and is called the
exponent.

For example: the floating point decimal number +613.2789 is


represented as
Mantissa exponent

+0.6132789 +03

The representation is equivalent to the scientific notation


+0.6132789*10+3

Floating point always represent in the following form:

M * re

For example the binary number +1001.11 is represented with an 8-bit


fraction and 6-bit exponent as follows:

Mantissa exponent

01001110 000100

The floating point number is equivalent to:

M * 2e = +(0.1001110)2 * 2+4

Difference between RISC and CISC

A computer with a large number of instructions is classified as a


complex instruction set computer (CISC). After 80’s it is found that
computers use fewer instructions with simple constructs so they can be
executed much after within the CPU without having to use memory as
often this type of computer is classified as reduced instruction set
computer.

The major characteristics of CISC are:

Its uses a large number of instructions 100-250


A large variety of addressing mode (5-20).

Variable length instruction formats.

Micro programmed control unit is used.

Memory instruction is fast.

The major characteristics of RISC are:

Relatively few instructions.

Relatively few addressing modes.

Memory access is limited.

Fixed length instruction formats.

Single cycle instruction execution.

Hardwired control unit is used.

You might also like