Professional Documents
Culture Documents
2 2ArithmeticFull PDF
2 2ArithmeticFull PDF
2 2ArithmeticFull PDF
Arithmetic
Outline
A
basic operation in all digital computers is the addition or subtraction of two numbers. ALU AND, OR, NOT, XOR Unsigned/signed numbers Addition/subtraction Multiplication Division Floating number operation
2
Adders
s x y c HA s c
4
(d) Graphical symbol
(c) Circuit
y c + xi yi ci + xi yi ci + xi yi ci = x i yi ci si = x i i i ci+1 = yi ci + xi ci + xi yi
Example: X 7 + Y = +6 Z 13 0 = + 00 1 1 1 1 1 1 0 1 0 1 xi yi si Legendforstage i
Carryout ci+1
Carryin ci
Figure6.1.Logicspecificationforastageofbinaryaddition.
x
i
i i
i + 1
x
i
i i
Fulladder
i + 1
(FA)
s (a)Logicf
orasinglestage
Subtraction?
x n 1 y n 1 c n 1 x 1 y 1 c 1 x 0 y 0
c n
F A
FA
F A
c 0
s n 1 Mostsignificantbit (MSB)position
s 1
s 0 Leastsignificantbit (LSB)position 7
(b)nbitripplecarryadder
ripple-carry adder
x y kn 1 kn 1 x y x y 2n 1 2n 1 n n x y x y n 1 n 1 0 0
c kn
nbit adder
nbit adder
c n
nbit adder
c 0
s kn 1
s (k 1)n
s 2n 1
s n
s n 1
s 0
(c)Cascadeofknbitadders
Figure6.2.Logicforadditionofbinaryvectors.
logic unit
yn 1 y1 y0 Add/Sub control
xn 1
x1
x0
cn
n bitadder
c0
sn
s1
s0
Figure6.3. Binaryadditionsubtractionlogicnetwork.
10
design Simple circuit structure Easy to understand Most power efficient Slowest (too long critical path, 2n gate delays)
11
Adders
We
12
Carry-lookahead Logic
Carry Generate Gi = Ai Bi Carry Propagate Pi = Ai xor Bi must generate carry when A = B = 1 carry-in will equal carry-out here
Sum and Carry can be reexpressed in terms of generate/propagate/Ci: Si = Ai xor Bi xor Ci = Pi xor Ci Ci+1 = Ai Bi + Ai Ci + Bi Ci = Ai Bi + Ci (Ai + Bi) = Ai Bi + Ci (Ai xor Bi) = G i + C i Pi 13
Carry-lookahead Logic
Reexpress the carry logic as follows: C1 = G0 + P0 C0 C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 C0 C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0 C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 C0 Each of the carry equations can be implemented in a two-level logic network Variables are the adder inputs and carry in to stage 0!
14
Carry-lookahead Implementation
Ai Bi Pi @ 1 gate delay Si @ 2 gate delay s Gi @ 1 gate delay
(a)
Ci
(b)
C0 P0 G0
C1
C0 P0 P1 P2 G0 P1 P2 G1 P2 G2 C3
(c)
C0 P0 P1 G0 P1 G1 C2
(d)
C4
Pi &Gi obtained in 1 gate delay fig.(a). Ci needs 2 more gate delays fig. (b). Total 3 gate delays for ci. Si needs one more gate delay. Four gate delays for sum bits.
Carry-lookahead Logic
Cascaded Carry Lookahead 4-bit adder Carry lookahead logic generates individual carries sums computed much faster
C0 A0 B0 C1 @3 A1 B1 C2 @3 A2 B2 C3 @3 A3 B3 C4 @3 S3 @4 S2 @4 S1 @4 S0 @2
16
c0
Carrylookaheadlogic
G0 II
P 0 II
Figure6.5.16bitcarrylookaheadadderbuiltfrom4bitadders(seeFigure6.4b).
c4 3, +2 gate delays= c8 , + 2more gate delays=c12 + 2 more gate delays= c16. sum 1 more gate delay. Total 10 delays compared to 32 for RCA
17
Carry-lookahead Logic
4 4 4 4 C8 4 4 C4 4 4 C0 @0 C16 A [15-12] B[15-12] C12 4-bit Adder P G 4 @8 S[15-12] @2 @3 C16 @5 P3 C4 G3 A [11-8] B[1 1-8] 4-bit Adder P G 4 @8 S[1 1-8] @5 C3 @2 @3 P2 G2 A [7-4] B[7-4] 4-bit Adder P G 4 @7 S[7-4] @5 C2 @2 @3 P1 G1 A [3-0] B[3-0] 4-bit Adder P G 4 @4 @2 @3 P0 P3-0 @3 G3-0 @5 G0 C0 C0 @0 S[3-0] @4 C1
4 bit adders with internal carry lookahead second level carry lookahead unit, extends lookahead to 16 bits Group Propagate P = P3 P2 P1 P0 Group Generate G = G3 + G2P3 + G1P3P2 + G0P3P2P1 18
Unsigned Multiplication
19
1 1 1
1 0
0 1
0 1 1
0 0
0 1
(143)ProductP
(a)Manualmultiplicationalgorithm
20
Array Multiplication
0 P artialproduct (PP0) m
3
Multiplicand
q 0 p
0
PP2
PP3
Bitofincomingpartialproduct(PP
i)
mj qi
Carryout
FA
Carryin
Bitofoutgoingpartialproduct[PP( (b)Arrayimplementation
i +1)]
21
22
23
Array Multiplication
What
is the critical path (worst case signal propagation delay path)? Assuming that there are two gate delays from the inputs to the outputs of a full adder block, the path has a total of 6(n-1)-1 gate delays, including the initial AND gate delay in all cells, for the nn array. Any advantages/disadvantages?
24
1101 0000 A 1101 0110 0011 1001 1001 0100 0001 1000 Product 1011 Q 1011 1101 1101 1110 1110 1111 1111 1111
Initialconfiguration
n 1
0 0 1 0 0 0 1 0
Firstcycle
Secondcycle
Thirdcycle
Fourthcycle
(b)Multiplicationexample m0
MultiplicandM (a)Registerconfiguration
25
Signed Multiplication
26
Signed Multiplication
Considering 2s-complement signed operands, what will happen to (-13)(+11) if following the same method of unsigned multiplication?
1 0 1 1 Signextensionis showninblue 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 ( 143) 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 ( 13) ( + 11)
Figure6.8.Signextensionofnegativemultiplicand.
27
Signed Multiplication
For
a negative multiplier, a straightforward solution is to form the 2s-complement of both the multiplier and the multiplicand and proceed as in the case of a positive multiplier. This is possible because complementation of both operands does not change the value or the sign of the product. A technique that works equally well for both negative and positive multipliers Booth algorithm.
28
Booth Algorithm
Consider
in a multiplication, the multiplier is positive 0011110, how many appropriately shifted versions of the multiplicand are added in a standard procedure?
0 0 0 1 0 1 1 0 0 1 1 0 1 1 0 1 0 +1 +1 + 1 +1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 1 1 0
29
0 0 0 0 0 0 0 0 0 1
0 1 0 0 0
0 1 0 0 0 1
0 1 0 1 0 0 0
Booth Algorithm
Since
0011110 = 0100000 0000010,(25-21) if we use the expression to the right, what will happen? 0 1 0 1 1 0 1
0 +1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
2'scomplementof themultiplicand
0 0 0 1 0 1 0 1 0 0 0 1 1 0
30
Booth Algorithm
In general, in the Booth scheme, -1 times the shifted multiplicand is selected when moving from 0 to 1, and +1 times the shifted multiplicand is selected when moving from 1 to 0, as the multiplier is scanned from right to left.
0 0 1 0 1 1 0 0 1
1 1 0 1 0 1 1 0 0
0 + 1 1 + 1 0 1 0 +1 0 0 1 +1 1 + 1 0 1 0 0
Figure6.10.Boothrecodingofamultiplier.
31
Booth Algorithm
0 1 1 0 1 (+13 ) 1 1 0 1 0 ( 6) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 1 0 1 +1 1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 1 0 ( 78)
Figure6.11.Boothmultiplicationwithanegativemultiplier.
32
Booth Algorithm
Multiplier Bit i 0 0 1 1 Bit i 1 0 1 0 1 Versionofmultiplicand selectedbybiti 0 M + 1 M 1 M 0 M
Figure6.12.Boothmultiplierrecodingtable.
33
Booth Algorithm
Best case a long string of 1s (skipping over 1s) Worst case 0s and 1s are alternating
Worstcase multiplier
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 + 1 1 +1 1 +1 1 +1 1 +1 1 +1 1 +1 1 +1 1 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0 1 0 0 +1 1 +1 0 1 +1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 +1 0 0 0 0 1 0 0 0 +1 0 0 1
34
Ordinary multiplier
Good multiplier
both positive and negative multipliers uniformly Efficient if large blocks of ones exist On the average, speed same as that of normal algorithm
35
Fast Multiplication
36
Bit-pair recoding halves the maximum number of summands (versions of the multiplicand).
Signextension 1 1 1 0 1 0 0 Implied0torightofLSB
0 1 +1 1 0
2
37
(a)ExampleofbitpairrecodingderivedfromBoothrecoding
0 M 38
(b)Tableofmultiplicandselectiondecisions
0 1 1 0 1 (+13 ) 1 1 0 1 0 ( 6)
0 (78)
1 1 0 1
1 1 0 1
1 1 0 1
1 1 0 0
1 0 0 1
0 0 0 0 0 1
1 1 0 1 1 2 0 1 1 0 1 1 0 0 1 0 39
Figure6.15.Multiplicationrequiringonlyn/2summands.
(a)Manualmultiplicationalgorithm
40
q PP1 p q 0 PP3 q 0 p q 0 p p p p p
3 2 2 1
0
0
PP2
i)
q T ypicalcell
Carryout
FA
Carryin
Bitofoutgoingpartialproduct[PP( (b)Arrayimplementation
i +1)]
41
m 3 q2 FA
m 2q 2 FA
m1 q2 FA
m0 q2 FA 0
m 3q 3
m 2q 3
m1 q3
m0 q3 0
FA p7 p6
FA p5
FA p4
FA p3
p2
p1
p0
42
(a)Ripplecarryarray(Figure6.6structure)
m 3q 3
m 2 q3 FA
m 1q 3 FA
m0 q3 FA FA
FA p7 p6
FA p5
FA p4
FA p3
p2
p1
p0
Figure 6.16. Ripple-carry and carry-save arrays for the multiplication operation M Q = P for 4-bit operands. (b)Carrysavearray
Figure6.16.Ripplecarryandcarrysavearraysforthe multiplicationoperationM xQ=Pfor4bitoperands.
43
The delay through the carry-save array is somewhat less than delay through the ripple-carry array. This is because the S and C vector outputs from each row are produced in parallel in one full-adder delay. Consider the addition of many summands, we can:
Group the summands in threes and perform carry-save addition on each of these groups in parallel to generate a set of S and C vectors in one full-adder delay Group all of the S and C vectors into threes, and perform carry-save addition on them, generating a further set of S and C vectors in one more full-adder delay Continue with this process until there are only two vectors remaining They can be added in a RCA or CLA to produce the desired product
44
0 1 0 1 1 0 1
1 1 1 1 0 1
1 1 1 0 1
0 1 0 1
1 1 1
(45) (63) A B C D E F
M Q
1 1
1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 0
0 1 1 0 1 0
(2,835)
Product
Figure6.17.AmultiplicationexampleusedtoillustratecarrysaveadditionasshowninFigure6.18.
45
1 x 1
0 1
1 1
1 1
0 1
1 1
M Q A B C
1 1 1 1 0 1 1 1 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0 1 1 0 0 0 + 0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0
0 1 1 0 1 0 1
1 1 0 0 1 1
1 0 1 0 0
0 1
1 0
1 1
D E F S
1 0 0 1 1 0 1 0 1 0 1
2 2
C
0 1 1 0 0 0 0 1 0 1 0 0 1 0 1
S1 C S
1
S2
3
C3 C2
1
0 0 0
0 0 0
1 0 1
S4 C
4
Product
46
Figure6.19.Schematicrepresentationofthecarrysave Figure 6.19. Schematic representation of the carry-save addition operations in Figure 6.18. additionoperationsinFigure6.18.
47
the number of summands is large, the time saved is proportionally much greater. Some omitted issues:
48
Integer Division
49
Manual Division
13 21 274 26 14 13 1 1101 10101 100010010 1101 10000 1101 1110 1101 1
Figure6.20.Longhanddivisionexamples.
50
Position the divisor appropriately with respect to the dividend and performs a subtraction. If the remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended by another bit of the dividend, the divisor is repositioned, and another subtraction is performed. If the remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding back the divisor, and the divisor is repositioned for another subtraction.
51
Circuit Arrangement
Shiftleft an an 1 A a0 qn 1 DividendQ Quotient setting q0
n + 1bit adder
mn 1 DivisorM
m0
Figure6.21.Circuitarrangementforbinarydivision.
52
Restoring Division
Shift
A and Q left one binary position Subtract M from A, and place the answer back in A If the sign of A is 1, set q to 0 and add M 0 back to A (restore A); otherwise, set q0 to 1
Repeat
53
11
10 1000 11 10
Examples
Initially Shift Subtract Set q 0 Restore Shift Subtract Set q 0 Restore Shift Subtract Set q 0 Shift Subtract Set q 0 Restore
0 0 0 1 1
0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 0 1 0 1 1 1 0
0 0 0 1 1
0 0 0 1 1
0 1 1 1 0 1 1 0 1 1 1 0 0 1
1 0 0 0 0 0 0 Firstcycle
0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0
Remainder
54
Figure6.22.Arestoringdivisionexample.
Nonrestoring Division
Avoid the need for restoring A after an unsuccessful subtraction. Any idea? A +ve, shift left and subtract =2A-M A -ve, restore,shift,subtract = A+M, 2(A+M),2A+M Step 1: (Repeat n times) If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A and Q left and add M to A. Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0. Step2: If the sign of A is 1, add M to A
55
Examples
Initially Shift Subtract Set q 0 Shift Add Set q 0 Shift Add Set q 0 Shift Subtract Set q 0
0 0 0 1
0 0 0 1
0 0 0 1
0 1 0 0
0 1 1 1
1 1 1 1 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 1 1 1 1
Add
1 1 1 1 1 0 0 0 1 1 0 0 0 1 0 Remainder
Restoreremainder
56
Figure6.23.Anonrestoringdivisionexample.
57
Floating-Point Numbers
So far we have dealt with fixed-point numbers (what is it?), and have considered them as integers. Floating-point numbers: the binary point is just to the right of the sign bit.
B =b0 .b 1b 2 b ( n 1)
1 2 ( n 1) F ( B ) = b0 20 + b 2 + b 2 + + b 2 1 2 ( n 1)
The position of the binary point is variable and is automatically adjusted as computation proceeds.
58
Floating-Point Numbers
What
are needed to represent a floating-point decimal number? Sign Mantissa (the significant digits) Exponent to an implied base (scale factor)
Normalized
the decimal point is placed to the right of the first (nonzero) significant digit.
59
Think about this number (all digits are decimal): X1.X2X3X4X5X6X710Y1Y2 It is possible to approximate this mantissa precision and scale factor range in a binary representation that occupies 32 bits: 24-bit mantissa (1 sign bit for signed number), 8-bit exponent. Instead of the signed exponent, E, the value actually stored in the exponent field is an unsigned integer E=E+127, so called excess-127 format
60
IEEE Standard
32bits S E M Signof 23bit number: 8bitsigned exponentin mantissafraction 0signifies+ excess127 1signifies representation 127 Valuerepresented= 1.M 2E (a)Singleprecision 0 00 10 1 00 0 0 0 10 1 0 . . . 0
(101000)2=4010, 40-127=-87
Sign
11bitexcess1023 exponent
61
IEEE Standard
For
excess-127 format, 0 E 255. However, 0 and 255 are used to represent special value. So actually 1 E 254. That means -126 E 127. Single precision uses 32-bit. The value range is from 2-126 to 2+127. Double precision used 64-bit. The value range is from 2-1022 to 2+1023.
62
Two Aspects
If a number is not normalized, it can always be put in normalized form by shifting the fraction and adjusting the exponent.
excess127exponent 0 1 0 0 0 1 0 0 0
0 0 1 0 1 1 0 ...
(Thereisnoimplicit1totheleftofthebinarypoint.)
(10001000)2=13610, 136-127=-9
Valuerepresented = + 0.0010110 2 9
(a)Unnormalizedvalue
0 1 0 0 0 0 1 0 1
0 1 1 0 ...
Valuerepresented = + 1.0110 2 (b)Normalizedversion
6
Figure6.25.FloatingpointnormalizationinIEEEsingleprecisionformat.
63
Two Aspects
As
computations proceed, a number that does not fall in the representable range of normal numbers might be generated. It requires an exponent less than -126 (underflow) or greater than +127 (overflow). Both are exceptions that need to be considered.
64
Special Values
The end value 0 and 255 are used to represent special values. When E=0 and M=0, the value exact 0 is represented. (0) When E=255 and M=0, the value is represented. ( ) When E=0 and M0, denormal numbers are represented. The value is 0.M2-126. When E=255 and M0, Not a Number (NaN).
65
Exceptions
A
processor must set exception flags if any of the following occur in performing operations: underflow, overflow, divide by zero, inexact (requires rounding), invalid (0/0). When exception occurs, the results are set to special values.
66
Add/Subtract rule
Choose the number with the smaller exponent and shift its mantissa right a number of steps equal to the difference in exponents. Set the exponent of the result equal to the larger exponent. Perform addition/subtraction on the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.
Multiply rule
Add the exponents and subtract 127. Multiply the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.
Divide rule
Subtract the exponents and add 127. Divide the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.
67
the intermediate steps, it is important to retain extra bits, often called guard bits, to yield the maximum accuracy in the final results. Removing the guard bits in generating a final result requires truncation of the extended mantissa how?
68
Chopping biased, 0 to 1 at LSB. 0.b-1b-2b-3000 -- 0.b-1b-2b-31110.b-1b-2b-3 Von Neumann Rounding (any of the bits to be removed are 1, the LSB of the retained bits is set to 1) unbiased, -1 to +1 at LSB. All 6-bit fractions with b-4b-5b6 not equal to 000 are truncated to 0.b-1b-21 Why unbiased rounding is better for the cases that many operands are involved? Rounding (A 1 is added to the LSB position of the bits to be retained if there is a 1 in the MSB position of the bits being removed) unbiased, - to + at LSB.
Round to the nearest number or nearest even number in case of a tie (0.b-1b-20000 - 0.b-1b-20, 0.b-1b-21100 - 0.b-1b-21+0.001) Best accuracy Most difficult to implement 69
most general-purpose processors, floatingpoint operations are available at the machineinstruction level, implemented in hardware. In high-performance processors, a significant portion of the chip area is assigned to floating-point operations. Addition/subtraction circuitry
70
EA
A: S , E , M 32bitoperands
A A
EB
MA SWAP
MB M ofnumber
withsmaller E
B: S , E , M
B B
8bit subtractor sign SA SB Add/ Subtract Combinational CONTROL network Add/Sub Sign n = EA EB
withlarger E
Mantissa adder/subtractor
EA
EB
Leadingzeros detector X
Magnitude M
Figure6.26.Floatingpointadditionsubtractionunit.
71