EEE105 S1Y1617 Lect 09 PDF

Lecture 3
Division, Floating-Point Arithmetic
EEE 105: COMPUTER

ORGANIZATION
initially the dividend, as each bit is shifted out left, the

quotient is shifted in from the right
Q = dividend/quotient register
initialized to 0 at the beginning of division

holds the remainder at the end of the operation
M = divisor register (n bits)

A = register to compare with the divisor (n+1) bits
Definition of terms
Division (1/5)
Lecture 11: Division, Floating-Point Arithmetic
restoring because it restores the remainder to

previous value if it is negative
shift A and Q
subtract M from A, and place the answer back in A
if the sign of A is 1, set q0 to 0 and add M back to A
(restore A); otherwise set q0 to 1
repeat n times
based on manual division algorithm

procedure
Restoring division algorithm
Division (2/5)
Division (3/5)
restore (A=A + M), shift left (A and Q), subtract

A=AM
equivalent to 2A + M {i.e. 2(A + M) M}
if A < 0
shift left (A and Q), subtract (A = A-M)

equivalent to 2A - M
avoids adding of the divisor back whenever a

negative result occurs
if A >= 0
Non-restoring division algorithm
Division (4/5)
step 2: if the sign of A is 1, add M to A (A holds the

remainder afterwards)
if sign(A)=0, shift left (A Q) by one bit position, subtract M

from A; else, shift left (A Q) by one bit position, add M to A
if the sign of A is 0, set q0 to 1; else set q0 to 0
step 1: do the following N times
procedure
Non-restoring division algorithm (contd)
Division (5/5)
10
11 1000
- 11
10
4-bit example
M = 00112 = 310
Q = 10002 = 810
using long-hand division
Details
Examples for Division
0
0
1
1
S
0
1
1
0000
A
0001
1101
1110
0011
0001
0010
1101
1111
0011
0010
M
0011
0000
0000
0000
000_
0000
000_
1000
Q
Shift
Subtract
Set q0
Restore
Shift
Subtract
Set q0
Restore
second cycle
first cycle
initial configuration
Restoring Division (1/2)
0
1
0
0
1
1
0010
quotient
remainder
0010
0001
001_
000_
0100
1101
0001
0010
1101
1111
0011
0010
Shift
Subtract
Set q0
Shift
Subtract
Set q0
Restore
Restoring Division (2/2)
fourth cycle
third cycle
10
0
1
1
0
0
remainder
0010
1101
1111
0011
0010
S
0
1
1
1
0
1
1
0
0
M
0011
0000
A
0001
1101
1110
1100
0011
1111
1110
0011
0001
quotient
0010
001_
0001
0000
000_
0000
000_
1000
Q
000_
Shift
Subtract
Set q0
Restore remainder
Shift
Subtract
Set q0
Shift
Add
Set q0
Shift
Add
Set q0
fourth cycle
third cycle
second cycle
first cycle
initial configuration
Non-restoring Division
11
6.0247 x 1023 mol-1 (Avogadro's number)

1.6022 x 10-19 C (magnitude of electron charge)
A lot of values in scientific calculations that

cannot be represented as integers
Can represent a higher range of numbers
compared to fixed-point numbers
Radix point can vary in position
Examples
Floating Point Numbers (1/3)
12
A number is said to be normalized when

decimal point is placed to the right of first
nonzero significant digit
numbers are given to 5 significant digits

scale factors 1023, 10-19 indicate position of
decimal point relative to significant digits
Examples (contd)
13
Y 1Y 2
number of significant digits = 7

range of exponent = +/- 99
mantissa : X1X2X3X4X5X6X7 (string of significant
digits)
X 1 . X 2 X 3 X 4 X 5 X 6 X 7 10
For a decimal system
14
mantissa (M)
23
32 bit Single precision format
exponent (E)
Conversion: N = (1)S 2E-127 (1.M)
1
sign
(S)
basic and extended floating-point number

formats
Defines functionality of floating-point

representation and arithmetic
Specifies the following:
IEEE754 standard (1/5)
implied 1
15
mantissa (M)
52
64 bit Double precision format
exponent (E)
11
extended precision formats are defined to allow for

extended range and precision
reduces round-off errors during intermediate
calculations
Conversion: N = (1)S 2E-1023 (1.M)
1
sign
(S)
basic formats (contd)
16
example of extended format: E has 15 bits and M has

64 bits
requires that numbers be normalized so that the
implied 1 be represented correctly
however, some values are too small and can be
represented by denormal numbers (E = 0 and M 0)
basic formats (contd)
17
conversions between integer and floating-point

formats and between different floating-point
formats
add/subtract
multiply
divide
square root
remainder and compare
floating-point operations
18
rounding modes
invalid operation
division by zero
overflow resulting E becomes more than max(E)
underflow resulting E becomes less than 0
inexact result is rounded off to fit into format
infinity is represented by E = max(E) and M = 0

NaN is represented by E = max(E) and M 0
infinity and not-a-number (NaN) arithmetic
floating-point exceptions and their handling
19
Compared to fixed-point addition, it has more

steps because of format
Mantissa addition is same as fixed-point
addition
Same procedure is used for subtraction
Floating-Point Addition and Subtraction (1/3)
20
implied 1 is
included
M2 = 00000100101000100101001
M2 = 10010100010010100101000 becomes
1. Compare exponents
2. If one exponent is smaller, its mantissa must
be shifted right by the difference between the
exponents
Example: If E1 = 100 and E2 = 95, M2 must
be shifted 5 steps to the right
21
3. Meanwhile, sign bit is used to determine

whether true addition or subtraction must be
done
4. Mantissa addition/subtraction is performed
5. Normalize result mantissa if needed
6. Round-off result mantissa before placing into
final format
22
3. Perform mantissa multiplication/division

(similar to integer multiplication/division)
4. Normalize result mantissa if needed
5. Round-off result mantissa before placing into
final format
for multiplication: E = E1 + E2 bias

for division: E = E1 E2 + bias
1. Determine sign of result

2. Determine exponent of result
Floating-Point Multiplication and Division
23
additional bits in mantissa retained during

intermediate steps of operations
maintains additional accuracy in final results
Intermediate values of exponents and

mantissas may need to be represented in
more bits
Guard bits
Floating-Point Arithmetic (1/5)
24
0.001000
0.001111
0.001
0.001
reduction in bit width

extra bits are simply discarded
biased approximation since error range is not
symmetrical about 0
example: truncate from 6bits
3bits
truncation
After an operation, results must be fit into a

specific format
25
0.001000
0.001001
0.010001
0.001
0.001
0.011
if bits to be removed are all 0, bits are simply dropped

without affecting retained bits
if any of the bits to be removed are 1, least significant
bit of retained bits is set to 1
unbiased approximation
larger error range
same maximum magnitude of error as chopping.
example: from 6 bits
3 bits
Von Neumann rounding
26
unbiased, closest approximation

if bits to be removed has an MSB of 1 and has other
1s, +1 to LSB of bits to be retained
if bits to be removed has an MSB of 0, simply truncate
if bits to be removed has an MSB of 1 and succeeded
only by 0s, use round to nearest
achieves least range of error but is also most difficult
to implement because of addition and possible renormalization
rounding to nearest (even)
27
0.010 + 0.001 = 0.011

0.001 + 0.001 = 0.010
0.010
0.001 + 0.001 = 0.010
0.010
0.001101
0.010011
0.001100
0.010100
3 bits
0.010101
example: from 6 bits
first two guard bits are part of mantissa to be removed

third is OR of all bits beyond first two
3 guard bits are required to implement rounding
rounding to nearest (even) (contd)
28
Floating-Point Adder
29
Koren, Computer Arithmetic Algorithms, 2nd ed.
Hamacher, et. al., Computer Organization, 5th ed.
References

EEE105 S1Y1617 Lect 09 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EEE105 S1Y1617 Lect 09 PDF

Uploaded by

Copyright:

Available Formats

Lecture 3

Division, Floating-Point Arithmetic

EEE 105: COMPUTER

initially the dividend, as each bit is shifted out left, the

initialized to 0 at the beginning of division

M = divisor register (n bits)

Lecture 11: Division, Floating-Point Arithmetic

restoring because it restores the remainder to

based on manual division algorithm

Restoring division algorithm

Lecture 11: Division, Floating-Point Arithmetic

Lecture 11: Division, Floating-Point Arithmetic

restore (A=A + M), shift left (A and Q), subtract

shift left (A and Q), subtract (A = A-M)

avoids adding of the divisor back whenever a

Non-restoring division algorithm

Lecture 11: Division, Floating-Point Arithmetic

step 2: if the sign of A is 1, add M to A (A holds the

if sign(A)=0, shift left (A Q) by one bit position, subtract M

step 1: do the following N times

Non-restoring division algorithm (contd)

Lecture 11: Division, Floating-Point Arithmetic

Examples for Division

Lecture 11: Division, Floating-Point Arithmetic

Restoring Division (1/2)

Lecture 11: Division, Floating-Point Arithmetic

Restoring Division (2/2)

Lecture 11: Division, Floating-Point Arithmetic

Lecture 11: Division, Floating-Point Arithmetic

6.0247 x 1023 mol-1 (Avogadro's number)

A lot of values in scientific calculations that

Floating Point Numbers (1/3)

Lecture 11: Division, Floating-Point Arithmetic

A number is said to be normalized when

numbers are given to 5 significant digits

Floating Point Numbers (2/3)

Lecture 11: Division, Floating-Point Arithmetic

number of significant digits = 7

For a decimal system

Floating Point Numbers (3/3)

Lecture 11: Division, Floating-Point Arithmetic

32 bit Single precision format

Conversion: N = (1)S 2E-127 (1.M)

basic and extended floating-point number

Defines functionality of floating-point

IEEE754 standard (1/5)

Lecture 11: Division, Floating-Point Arithmetic

64 bit Double precision format

extended precision formats are defined to allow for

Conversion: N = (1)S 2E-1023 (1.M)

basic formats (contd)

IEEE754 standard (2/5)

Lecture 11: Division, Floating-Point Arithmetic

example of extended format: E has 15 bits and M has

basic formats (contd)

IEEE754 standard (3/5)

Lecture 11: Division, Floating-Point Arithmetic

conversions between integer and floating-point

IEEE754 standard (4/5)

Lecture 11: Division, Floating-Point Arithmetic

infinity is represented by E = max(E) and M = 0

infinity and not-a-number (NaN) arithmetic

floating-point exceptions and their handling