2 2ArithmeticFull PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 71

Chapter 6.

Arithmetic

Outline
A

basic operation in all digital computers is the addition or subtraction of two numbers. ALU AND, OR, NOT, XOR Unsigned/signed numbers Addition/subtraction Multiplication Division Floating number operation
2

Adders

Addition of Unsigned Numbers Half Adder


x +y c s Carry 0 +0 00 Sum 0 +1 01 1 +0 01 1 +1 10

(a) The four possible cases


Carry x 0 0 1 1 y 0 1 0 1 c 0 0 0 1 Sum s 0 1 1 0

(b) Truth table


x y

s x y c HA s c

4
(d) Graphical symbol

(c) Circuit

Addition and Subtraction of Signed Numbers


xi 0 0 0 0 1 1 1 1 yi 0 0 1 1 0 0 1 1 Carryin ci 0 1 0 1 0 1 0 1 Sum si 0 1 1 0 1 0 0 1 Carryout ci +1 0 0 0 1 0 1 1 1

y c + xi yi ci + xi yi ci + xi yi ci = x i yi ci si = x i i i ci+1 = yi ci + xi ci + xi yi
Example: X 7 + Y = +6 Z 13 0 = + 00 1 1 1 1 1 1 0 1 0 1 xi yi si Legendforstage i

Carryout ci+1

Carryin ci

Figure6.1.Logicspecificationforastageofbinaryaddition.

Addition and Subtraction of Signed Numbers


A

full adder (FA)


y c x y c
i i i

x
i

i i

i + 1

x
i

i i

Fulladder
i + 1

(FA)

s (a)Logicf

orasinglestage

Addition and Subtraction of Signed Numbers


n-bit ripple-carry adder Overflow? cn cn-1

Subtraction?
x n 1 y n 1 c n 1 x 1 y 1 c 1 x 0 y 0

c n

F A

FA

F A

c 0

s n 1 Mostsignificantbit (MSB)position

s 1

s 0 Leastsignificantbit (LSB)position 7

(b)nbitripplecarryadder

Addition and Subtraction of Signed Numbers


kn-bit

ripple-carry adder
x y kn 1 kn 1 x y x y 2n 1 2n 1 n n x y x y n 1 n 1 0 0

c kn

nbit adder

nbit adder

c n

nbit adder

c 0

s kn 1

s (k 1)n

s 2n 1

s n

s n 1

s 0

(c)Cascadeofknbitadders

Figure6.2.Logicforadditionofbinaryvectors.

Addition and Subtraction of Signed Numbers


Addition/subtraction

logic unit
yn 1 y1 y0 Add/Sub control

xn 1

x1

x0

cn

n bitadder

c0

sn

s1

s0

Figure6.3. Binaryadditionsubtractionlogicnetwork.

Make Addition Faster

10

Ripple-Carry Adder (RCA)


Straight-forward

design Simple circuit structure Easy to understand Most power efficient Slowest (too long critical path, 2n gate delays)

11

Adders
We

can view addition in terms of generate, G[i], and propagate, P[i].

12

Carry-lookahead Logic
Carry Generate Gi = Ai Bi Carry Propagate Pi = Ai xor Bi must generate carry when A = B = 1 carry-in will equal carry-out here

Sum and Carry can be reexpressed in terms of generate/propagate/Ci: Si = Ai xor Bi xor Ci = Pi xor Ci Ci+1 = Ai Bi + Ai Ci + Bi Ci = Ai Bi + Ci (Ai + Bi) = Ai Bi + Ci (Ai xor Bi) = G i + C i Pi 13

Carry-lookahead Logic
Reexpress the carry logic as follows: C1 = G0 + P0 C0 C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 C0 C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0 C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 C0 Each of the carry equations can be implemented in a two-level logic network Variables are the adder inputs and carry in to stage 0!

14

Carry-lookahead Implementation
Ai Bi Pi @ 1 gate delay Si @ 2 gate delay s Gi @ 1 gate delay

(a)

Ci

Adder with Propagate and Generate Outputs Increasingly complex logic


C0 P0 P1 P2 P3 G0 P1 P2 P3 G1 P2 P3 G2 P3 G3

(b)

C0 P0 G0

C1

C0 P0 P1 P2 G0 P1 P2 G1 P2 G2 C3

(c)

C0 P0 P1 G0 P1 G1 C2

(d)

C4

Pi &Gi obtained in 1 gate delay fig.(a). Ci needs 2 more gate delays fig. (b). Total 3 gate delays for ci. Si needs one more gate delay. Four gate delays for sum bits.

Higher fan-in for more complex logic 15

Carry-lookahead Logic
Cascaded Carry Lookahead 4-bit adder Carry lookahead logic generates individual carries sums computed much faster
C0 A0 B0 C1 @3 A1 B1 C2 @3 A2 B2 C3 @3 A3 B3 C4 @3 S3 @4 S2 @4 S1 @4 S0 @2

16

Carry-lookahead Logic Extension


x1512 y1512 c12 x118 y118 c8 x74 y74 c4 x30 y30 c16 4bitadder s1512 G3 I P3 I G2 I 4bitadder s118 P 2I G 1I 4bitadder s74 P 1I G 0I 4bitadder s30 P0 I

c0

Carrylookaheadlogic

G0 II

P 0 II

Figure6.5.16bitcarrylookaheadadderbuiltfrom4bitadders(seeFigure6.4b).

c4 3, +2 gate delays= c8 , + 2more gate delays=c12 + 2 more gate delays= c16. sum 1 more gate delay. Total 10 delays compared to 32 for RCA

17

Carry-lookahead Logic
4 4 4 4 C8 4 4 C4 4 4 C0 @0 C16 A [15-12] B[15-12] C12 4-bit Adder P G 4 @8 S[15-12] @2 @3 C16 @5 P3 C4 G3 A [11-8] B[1 1-8] 4-bit Adder P G 4 @8 S[1 1-8] @5 C3 @2 @3 P2 G2 A [7-4] B[7-4] 4-bit Adder P G 4 @7 S[7-4] @5 C2 @2 @3 P1 G1 A [3-0] B[3-0] 4-bit Adder P G 4 @4 @2 @3 P0 P3-0 @3 G3-0 @5 G0 C0 C0 @0 S[3-0] @4 C1

Lookahead Carry Unit

4 bit adders with internal carry lookahead second level carry lookahead unit, extends lookahead to 16 bits Group Propagate P = P3 P2 P1 P0 Group Generate G = G3 + G2P3 + G1P3P2 + G0P3P2P1 18

Unsigned Multiplication

19

Manual Multiplication Algorithm


1 1 1 0 0 1 1 1 (13)MultiplicandM (11)MultiplierQ

1 1 1

1 0

0 1

0 1 1

0 0

0 1

(143)ProductP

(a)Manualmultiplicationalgorithm

20

Array Multiplication
0 P artialproduct (PP0) m
3

Multiplicand

q 0 p
0

PP1 q 0 p q 0 PP4=p7 , p 6,... p 0 =Product p q 0 p p p p p Typicalcell


2 3 2 1

PP2

PP3

Bitofincomingpartialproduct(PP

i)

mj qi

Carryout

FA

Carryin

Bitofoutgoingpartialproduct[PP( (b)Arrayimplementation

i +1)]

21

22

Another Version of 44 Array Multiplier

23

Array Multiplication
What

is the critical path (worst case signal propagation delay path)? Assuming that there are two gate delays from the inputs to the outputs of a full adder block, the path has a total of 6(n-1)-1 gate delays, including the initial AND gate delay in all cells, for the nn array. Any advantages/disadvantages?
24

Sequential Circuit Binary Multiplier


RegisterA(initially0) M Shiftright an a0 q q MultiplierQ Add/Noadd control 0 C
0

1101 0000 A 1101 0110 0011 1001 1001 0100 0001 1000 Product 1011 Q 1011 1101 1101 1110 1110 1111 1111 1111

Initialconfiguration

n 1

0 0 1 0 0 0 1 0

Add Shift Add Shift Noadd Shift Add Shift

Firstcycle

Secondcycle

Thirdcycle

nbit adder MUX Control sequencer 0 m


n 1

Fourthcycle

(b)Multiplicationexample m0

MultiplicandM (a)Registerconfiguration

25

Signed Multiplication

26

Signed Multiplication

Considering 2s-complement signed operands, what will happen to (-13)(+11) if following the same method of unsigned multiplication?
1 0 1 1 Signextensionis showninblue 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 ( 143) 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 ( 13) ( + 11)

Figure6.8.Signextensionofnegativemultiplicand.

27

Signed Multiplication
For

a negative multiplier, a straightforward solution is to form the 2s-complement of both the multiplier and the multiplicand and proceed as in the case of a positive multiplier. This is possible because complementation of both operands does not change the value or the sign of the product. A technique that works equally well for both negative and positive multipliers Booth algorithm.
28

Booth Algorithm
Consider

in a multiplication, the multiplier is positive 0011110, how many appropriately shifted versions of the multiplicand are added in a standard procedure?
0 0 0 1 0 1 1 0 0 1 1 0 1 1 0 1 0 +1 +1 + 1 +1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 1 1 0
29

0 0 0 0 0 0 0 0 0 1

0 1 0 0 0

0 1 0 0 0 1

0 1 0 1 0 0 0

Booth Algorithm
Since

0011110 = 0100000 0000010,(25-21) if we use the expression to the right, what will happen? 0 1 0 1 1 0 1
0 +1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
2'scomplementof themultiplicand

0 0 0 1 0 1 0 1 0 0 0 1 1 0

30

Booth Algorithm

In general, in the Booth scheme, -1 times the shifted multiplicand is selected when moving from 0 to 1, and +1 times the shifted multiplicand is selected when moving from 1 to 0, as the multiplier is scanned from right to left.

0 0 1 0 1 1 0 0 1

1 1 0 1 0 1 1 0 0

0 + 1 1 + 1 0 1 0 +1 0 0 1 +1 1 + 1 0 1 0 0

Figure6.10.Boothrecodingofamultiplier.

31

Booth Algorithm
0 1 1 0 1 (+13 ) 1 1 0 1 0 ( 6) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 1 0 1 +1 1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 1 0 ( 78)

Figure6.11.Boothmultiplicationwithanegativemultiplier.
32

Booth Algorithm
Multiplier Bit i 0 0 1 1 Bit i 1 0 1 0 1 Versionofmultiplicand selectedbybiti 0 M + 1 M 1 M 0 M

Figure6.12.Boothmultiplierrecodingtable.
33

Booth Algorithm

Best case a long string of 1s (skipping over 1s) Worst case 0s and 1s are alternating

Worstcase multiplier

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 + 1 1 +1 1 +1 1 +1 1 +1 1 +1 1 +1 1 +1 1 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0 1 0 0 +1 1 +1 0 1 +1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 +1 0 0 0 0 1 0 0 0 +1 0 0 1
34

Ordinary multiplier

Good multiplier

Booth Algorithm - Advantages


Handles

both positive and negative multipliers uniformly Efficient if large blocks of ones exist On the average, speed same as that of normal algorithm

35

Fast Multiplication

36

Bit-Pair Recoding of Multipliers

Bit-pair recoding halves the maximum number of summands (versions of the multiplicand).
Signextension 1 1 1 0 1 0 0 Implied0torightofLSB

0 1 +1 1 0

2
37

(a)ExampleofbitpairrecodingderivedfromBoothrecoding

Bit-Pair Recoding of Multipliers


Multiplierbitpair i+1 0 0 0 0 1 1 1 1 i 0 0 1 1 0 0 1 1 Multiplierbitontheright i1 0 1 0 1 0 1 0 1 Multiplicand selectedatpositioni 0 M + 1 M + 1 M +2 2 1 1 M M M M

0 M 38

(b)Tableofmultiplicandselectiondecisions

Bit-Pair Recoding of Multipliers


0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 1 + 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 1 1 0 0

0 1 1 0 1 (+13 ) 1 1 0 1 0 ( 6)

0 (78)

1 1 0 1

1 1 0 1

1 1 0 1

1 1 0 0

1 0 0 1

0 0 0 0 0 1

1 1 0 1 1 2 0 1 1 0 1 1 0 0 1 0 39

Figure6.15.Multiplicationrequiringonlyn/2summands.

Carry-Save Addition of Summands


1 1 1 1 0 1 1 0 1 0 0 0 0 1 0 1 1 1 1 1 (143)ProductP 1 0 1 0 0 0 1 0 1 1 1 1 (13)MultiplicandM (11)MultiplierQ

(a)Manualmultiplicationalgorithm

40

Carry-Save Addition of Summands


Multiplicand m 0 m P artialproduct 0 3 (PP0) m
2

q PP1 p q 0 PP3 q 0 p q 0 p p p p p
3 2 2 1

0
0

PP2

Bitofincomingpartialproduct(PP PP4=p7 , p 6,... p 0 =Product

i)

q T ypicalcell

Carryout

FA

Carryin

Bitofoutgoingpartialproduct[PP( (b)Arrayimplementation

i +1)]

41

Carry-Save Addition of Summands

CSA speeds up the addition process.


0 m 3q 1 FA m 3q 0 m2 q 1 FA m2 q0 m1 q1 FA m1 q0 m 0 q1 FA 0 m0 q0

m 3 q2 FA

m 2q 2 FA

m1 q2 FA

m0 q2 FA 0

m 3q 3

m 2q 3

m1 q3

m0 q3 0

FA p7 p6

FA p5

FA p4

FA p3

p2

p1

p0

42
(a)Ripplecarryarray(Figure6.6structure)

Carry-Save Addition of Summands


0 m 3q 1 m3 q2 FA m2 q 2 m 3q 0 m2 q 1 FA m1 q2 m2 q0 m1 q1 FA m0 q2 m1 q0 m 0 q1 FA 0 m0 q0

m 3q 3

m 2 q3 FA

m 1q 3 FA

m0 q3 FA FA

FA p7 p6

FA p5

FA p4

FA p3

p2

p1

p0

Figure 6.16. Ripple-carry and carry-save arrays for the multiplication operation M Q = P for 4-bit operands. (b)Carrysavearray
Figure6.16.Ripplecarryandcarrysavearraysforthe multiplicationoperationM xQ=Pfor4bitoperands.

43

Carry-Save Addition of Summands

The delay through the carry-save array is somewhat less than delay through the ripple-carry array. This is because the S and C vector outputs from each row are produced in parallel in one full-adder delay. Consider the addition of many summands, we can:
Group the summands in threes and perform carry-save addition on each of these groups in parallel to generate a set of S and C vectors in one full-adder delay Group all of the S and C vectors into threes, and perform carry-save addition on them, generating a further set of S and C vectors in one more full-adder delay Continue with this process until there are only two vectors remaining They can be added in a RCA or CLA to produce the desired product
44

Carry-Save Addition of Summands


1
X

0 1 0 1 1 0 1

1 1 1 1 0 1

1 1 1 0 1

0 1 0 1

1 1 1

(45) (63) A B C D E F

M Q

1 1

1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 0

0 1 1 0 1 0

(2,835)

Product

Figure6.17.AmultiplicationexampleusedtoillustratecarrysaveadditionasshowninFigure6.18.

45

1 x 1

0 1

1 1

1 1

0 1

1 1

M Q A B C

1 1 1 1 0 1 1 1 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0 1 1 0 0 0 + 0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0

0 1 1 0 1 0 1

1 1 0 0 1 1

1 0 1 0 0

0 1

1 0

1 1

D E F S

1 0 0 1 1 0 1 0 1 0 1

2 2

C
0 1 1 0 0 0 0 1 0 1 0 0 1 0 1

S1 C S
1

S2
3

C3 C2
1

0 0 0

0 0 0

1 0 1

S4 C
4

Product

Figure6.18. ThemultiplicationexamplefromFigure6.17performedusing carrysaveaddition.

46

Carry-Save Addition of Summands


F C2 E D S2 C C1 B A S1 Level2CSA C2 C4 C3 S4 Finaladdition + Product S3 Level3CSA Level1CSA

Figure6.19.Schematicrepresentationofthecarrysave Figure 6.19. Schematic representation of the carry-save addition operations in Figure 6.18. additionoperationsinFigure6.18.

47

Carry-Save Addition of Summands


When

the number of summands is large, the time saved is proportionally much greater. Some omitted issues:

Sign-extension Computation width of the final CLA/RCA Bit-pair recoding

48

Integer Division

49

Manual Division
13 21 274 26 14 13 1 1101 10101 100010010 1101 10000 1101 1110 1101 1

Figure6.20.Longhanddivisionexamples.
50

Longhand Division Steps

Position the divisor appropriately with respect to the dividend and performs a subtraction. If the remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended by another bit of the dividend, the divisor is repositioned, and another subtraction is performed. If the remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding back the divisor, and the divisor is repositioned for another subtraction.
51

Circuit Arrangement
Shiftleft an an 1 A a0 qn 1 DividendQ Quotient setting q0

n + 1bit adder

Add/Subtract Control sequencer

mn 1 DivisorM

m0

Figure6.21.Circuitarrangementforbinarydivision.

52

Restoring Division
Shift

A and Q left one binary position Subtract M from A, and place the answer back in A If the sign of A is 1, set q to 0 and add M 0 back to A (restore A); otherwise, set q0 to 1
Repeat

these steps n times

53

11

10 1000 11 10

Examples

Initially Shift Subtract Set q 0 Restore Shift Subtract Set q 0 Restore Shift Subtract Set q 0 Shift Subtract Set q 0 Restore

0 0 0 1 1

0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 0 1 0 1 1 1 0

0 0 0 1 1

0 0 0 1 1

0 1 1 1 0 1 1 0 1 1 1 0 0 1

1 0 0 0 0 0 0 Firstcycle

0 0 0 0 0 0 0 Secondcycle 0 0 0 0 0 0 0 Thirdcycle 0 0 0 1 0 0 1 Fourthcycle 0 0 1 0 Quotient

0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0

Remainder

54
Figure6.22.Arestoringdivisionexample.

Nonrestoring Division

Avoid the need for restoring A after an unsuccessful subtraction. Any idea? A +ve, shift left and subtract =2A-M A -ve, restore,shift,subtract = A+M, 2(A+M),2A+M Step 1: (Repeat n times) If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A and Q left and add M to A. Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0. Step2: If the sign of A is 1, add M to A
55

Examples

Initially Shift Subtract Set q 0 Shift Add Set q 0 Shift Add Set q 0 Shift Subtract Set q 0

0 0 0 1

0 0 0 1

0 0 0 1

0 1 0 0

0 1 1 1

1 0 0 0 0 0 0 0 0 0 0 0 0 0 Secondcycle 0 0 0 0 0 0 0 Thirdcycle 0 0 0 1 0 0 1 Fourthcycle 0 0 1 0 Quotient Firstcycle

1 1 1 1 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 1 1 1 1

Add

1 1 1 1 1 0 0 0 1 1 0 0 0 1 0 Remainder

Restoreremainder

56
Figure6.23.Anonrestoringdivisionexample.

Floating-Point Numbers and Operations

57

Floating-Point Numbers

So far we have dealt with fixed-point numbers (what is it?), and have considered them as integers. Floating-point numbers: the binary point is just to the right of the sign bit.
B =b0 .b 1b 2 b ( n 1)

1 2 ( n 1) F ( B ) = b0 20 + b 2 + b 2 + + b 2 1 2 ( n 1)

Where the range of F is:


(n 1) 1 F 1 2

The position of the binary point is variable and is automatically adjusted as computation proceeds.
58

Floating-Point Numbers
What

are needed to represent a floating-point decimal number? Sign Mantissa (the significant digits) Exponent to an implied base (scale factor)
Normalized

the decimal point is placed to the right of the first (nonzero) significant digit.
59

IEEE Standard for FloatingPoint Numbers

Think about this number (all digits are decimal): X1.X2X3X4X5X6X710Y1Y2 It is possible to approximate this mantissa precision and scale factor range in a binary representation that occupies 32 bits: 24-bit mantissa (1 sign bit for signed number), 8-bit exponent. Instead of the signed exponent, E, the value actually stored in the exponent field is an unsigned integer E=E+127, so called excess-127 format
60

IEEE Standard
32bits S E M Signof 23bit number: 8bitsigned exponentin mantissafraction 0signifies+ excess127 1signifies representation 127 Valuerepresented= 1.M 2E (a)Singleprecision 0 00 10 1 00 0 0 0 10 1 0 . . . 0

(101000)2=4010, 40-127=-87

Valuerepresented=1.001010 0 2 87 (b)Exampleofasingleprecisionnumber 64bits S E M 52bit mantissafraction 1023 Valuerepresented= 1.M 2E (c)Doubleprecision


Figure6.24.IEEEstandardfloatingpointformats.

Sign

11bitexcess1023 exponent

61

IEEE Standard
For

excess-127 format, 0 E 255. However, 0 and 255 are used to represent special value. So actually 1 E 254. That means -126 E 127. Single precision uses 32-bit. The value range is from 2-126 to 2+127. Double precision used 64-bit. The value range is from 2-1022 to 2+1023.
62

Two Aspects

If a number is not normalized, it can always be put in normalized form by shifting the fraction and adjusting the exponent.
excess127exponent 0 1 0 0 0 1 0 0 0

0 0 1 0 1 1 0 ...

(Thereisnoimplicit1totheleftofthebinarypoint.)

(10001000)2=13610, 136-127=-9

Valuerepresented = + 0.0010110 2 9
(a)Unnormalizedvalue

0 1 0 0 0 0 1 0 1

0 1 1 0 ...
Valuerepresented = + 1.0110 2 (b)Normalizedversion
6

6+127=133. 13310, = (100000101)2

Figure6.25.FloatingpointnormalizationinIEEEsingleprecisionformat.

63

Two Aspects
As

computations proceed, a number that does not fall in the representable range of normal numbers might be generated. It requires an exponent less than -126 (underflow) or greater than +127 (overflow). Both are exceptions that need to be considered.

64

Special Values

The end value 0 and 255 are used to represent special values. When E=0 and M=0, the value exact 0 is represented. (0) When E=255 and M=0, the value is represented. ( ) When E=0 and M0, denormal numbers are represented. The value is 0.M2-126. When E=255 and M0, Not a Number (NaN).
65

Exceptions
A

processor must set exception flags if any of the following occur in performing operations: underflow, overflow, divide by zero, inexact (requires rounding), invalid (0/0). When exception occurs, the results are set to special values.

66

Arithmetic Operations on Floating-Point Numbers

Add/Subtract rule
Choose the number with the smaller exponent and shift its mantissa right a number of steps equal to the difference in exponents. Set the exponent of the result equal to the larger exponent. Perform addition/subtraction on the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.

Multiply rule
Add the exponents and subtract 127. Multiply the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.

Divide rule
Subtract the exponents and add 127. Divide the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.

67

Guard Bits and Truncation


During

the intermediate steps, it is important to retain extra bits, often called guard bits, to yield the maximum accuracy in the final results. Removing the guard bits in generating a final result requires truncation of the extended mantissa how?

68

Guard Bits and Truncation


Chopping biased, 0 to 1 at LSB. 0.b-1b-2b-3000 -- 0.b-1b-2b-31110.b-1b-2b-3 Von Neumann Rounding (any of the bits to be removed are 1, the LSB of the retained bits is set to 1) unbiased, -1 to +1 at LSB. All 6-bit fractions with b-4b-5b6 not equal to 000 are truncated to 0.b-1b-21 Why unbiased rounding is better for the cases that many operands are involved? Rounding (A 1 is added to the LSB position of the bits to be retained if there is a 1 in the MSB position of the bits being removed) unbiased, - to + at LSB.
Round to the nearest number or nearest even number in case of a tie (0.b-1b-20000 - 0.b-1b-20, 0.b-1b-21100 - 0.b-1b-21+0.001) Best accuracy Most difficult to implement 69

Implementing Floating-Point Operations


Hardware/software In

most general-purpose processors, floatingpoint operations are available at the machineinstruction level, implemented in hardware. In high-performance processors, a significant portion of the chip area is assigned to floating-point operations. Addition/subtraction circuitry
70

EA
A: S , E , M 32bitoperands
A A

EB

MA SWAP

MB M ofnumber
withsmaller E

B: S , E , M
B B

8bit subtractor sign SA SB Add/ Subtract Combinational CONTROL network Add/Sub Sign n = EA EB

M ofnumber SHIFTER nbits toright

withlarger E

Mantissa adder/subtractor

EA

EB

Leadingzeros detector X

Magnitude M

MUX E Normalizeand round

8bit subtractor E X R: SR ER MR 32bit result R = A+B

Figure6.26.Floatingpointadditionsubtractionunit.

71

You might also like