Professional Documents
Culture Documents
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
Floating Point
Sept 6, 2006
Topics
class03.ppt
15-213, F06
x == (int)(float) x
int x = ;
float f = ;
double d = ;
Assume neither
d nor f is NaN
x == (int)(double) x
f == (float)(double) f
d == (float) d
f == -(-f);
2/3 == 2/3.0
d < 0.0
d > f
-f > -d
d * d >= 0.0
(d+f)-d == f
2
15-213, F06
defining standard
15-213, F06
bi bi1
b2 b1
4
2
1
b0 . b1 b2 b3
1/2
1/4
1/8
bj
2j
Representation
bk 2 k
k j
4
15-213, F06
5-3/4
2-7/8
63/64
Representation
101.112
10.1112
0.1111112
Observations
15-213, F06
Representable Numbers
Limitation
Value
1/3
1/5
1/10
Representation
0.0101010101[01]2
0.001100110011[0011]2
0.0001100110011[0011]2
15-213, F06
1s M 2E
Encoding
s
exp
frac
15-213, F06
exp
frac
Sizes
1 bit wasted
15-213, F06
15-213, F06
Significand
M
=
frac =
Exponent
E
=
Bias =
Exp =
1.11011011011012
110110110110100000000002
13
127
140 =
100011002
15213:
6
6
D
B
4
0
0
0100 0110 0110 1101 1011 0100 0000
100 0110 0
1110 1101 1011 01
15-213, F06
Denormalized Values
Condition
exp = 0000
Value
Cases
Represents value 0
Note that have distinct values +0 and 0
11
Gradual underflow
15-213, F06
Special Values
Condition
exp = 1111
Cases
Not-a-Number (NaN)
Represents case when no numeric value can be determined
E.g., sqrt(1),
12
15-213, F06
NaN
13
-Normalized
+Denorm
-Denorm
0
+0
+Normalized
+
NaN
15-213, F06
normalized, denormalized
14
3 2
exp
frac
15-213, F06
15
Exp
exp
2E
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
-6
-6
-5
-4
-3
-2
-1
0
+1
+2
+3
+4
+5
+6
+7
n/a
1/64
1/64
1/32
1/16
1/8
1/4
1/2
1
2
4
8
16
32
64
128
(denorms)
(inf, NaN)
15-213, F06
Dynamic Range
E
Value
0000 000
0000 001
0000 010
-6
-6
-6
0
1/8*1/64 = 1/512
2/8*1/64 = 2/512
closest to zero
0000
0000
0001
0001
110
111
000
001
-6
-6
-6
-6
6/8*1/64
7/8*1/64
8/8*1/64
9/8*1/64
=
=
=
=
6/512
7/512
8/512
9/512
largest denorm
smallest norm
0110
0110
0111
0111
0111
110
111
000
001
010
-1
-1
0
0
0
14/8*1/2
15/8*1/2
8/8*1
9/8*1
10/8*1
=
=
=
=
=
14/16
15/16
1
9/8
10/8
7
7
n/a
14/8*128 = 224
15/8*128 = 240
inf
s exp
0
0
Denormalized 0
numbers
0
0
0
0
0
0
Normalized 0
numbers
0
0
0
0
0
16
frac
1110 110
1110 111
1111 000
closest to 1 below
closest to 1 above
largest norm
15-213, F06
Distribution of Values
6-bit IEEE-like format
e = 3 exponent bits
f = 2 fraction bits
Bias is 3
-10
-5
Denormalized
17
0
Normalized
10
15
Infinity
15-213, F06
Distribution of Values
(close-up view)
6-bit IEEE-like format
e = 3 exponent bits
f = 2 fraction bits
Bias is 3
-1
-0.5
Denormalized
18
0
Normalized
0.5
Infinity
15-213, F06
Interesting Numbers
Description
exp
Zero
0000 0000
0.0
0000 0001
2 {23,52} X 2 {126,1022}
0000 1111
(1.0 ) X 2 {126,1022}
Largest Denormalized
frac
Numeric Value
1.0 X 2 {126,1022}
One
0111 0000
1.0
Largest Normalized
1110 1111
(2.0 ) X 2{127,1023}
19
15-213, F06
All bits = 0
Must consider -0 = 0
NaNs problematic
Otherwise OK
20
15-213, F06
$1.60
$1.50
$2.50
$1.50
Zero
$1
$1
$1
$2
$1
$1
$1
$1
$2
$2
Round up (+)
$2
$2
$2
$3
$1
$1
$2
$2
$2
$2
Note:
1. Round down: rounded result is close to but no greater than true result.
2. Round up: rounded result is close to but no less than true result.
21
15-213, F06
estimated
22
1.23
1.24
1.24
1.24
Examples
Value
Binary
2 3/32
10.000112 10.002
(<1/2down)
2 3/16
10.001102 10.012
(>1/2up)
2 1/4
2 7/8
10.111002 11.002
(1/2up)
2 5/8
10.101002 10.102
(1/2down)
2 1/2
23
Rounded Action
Rounded Value
15-213, F06
FP Multiplication
Operands
(1)s1 M1 2E1
(1)s2 M2 2E2
Exact Result
(1)s M 2E
Sign s: s1 ^s2
Significand M:
M1 *M2
Exponent E:
E1 +E2
Fixing
Implementation
24
FP Addition
Operands
(1)s1 M1 2E1
E1E2
(1)s2 M2 2E2
(1)s1 M1
Assume E1 > E2
Exact Result
(1)s M 2E
(1)s2 M2
Sign s, significand M:
(1)s M
Exponent E:
E1
Fixing
25
YES
Commutative?
YES
Associative?
NO
0 is additive identity?
YES
ALMOST
Monotonicity
a b a+c b+c?
26
ALMOST
15-213, F06
YES
Multiplication Commutative?
YES
Multiplication is Associative?
NO
1 is multiplicative identity?
YES
Monotonicity
a b & c 0 a *c b *c?
Except for infinities & NaNs
27
ALMOST
15-213, F06
7 6
3 2
exp
frac
Case Study
28
Example Numbers
128
10000000
15
00001101
33
00010001
35
00010011
138
10001010
63
00111111
15-213, F06
Normalize
7 6
3 2
exp
frac
Requirement
29
Value
Binary
Fraction
Exponent
128
10000000
1.0000000
15
00001101
1.1010000
17
00010001
1.0001000
19
00010011
1.0011000
138
10001010
1.0001010
63
00111111
1.1111100
5
15-213, F06
Rounding
1.BBGRXXX
Guard bit: LSB of result
Round up conditions
30
Fraction
GRS
Incr? Rounded
128
1.0000000
000
1.000
15
1.1010000
100
1.101
17
1.0001000
010
1.000
19
1.0011000
110
1.010
138
1.0001010
111
1.001
63
1.1111100
111
10.000
15-213, F06
Postnormalize
Issue
31
Rounded
Exp
Adjusted
128
1.000
128
15
1.101
15
17
1.000
16
19
1.010
20
138
1.001
134
63
10.000
1.000/6
Result
64
15-213, F06
Floating Point in C
C Guarantees Two Levels
float
single precision
double
double precision
Conversions
int to double
int to float
32
15-213, F06
Default Format
Currency Format
33
Number
Subtract 16 Subtract .3 Subtract .01
16.31
0.31
0.01 -1.2681E-15
$16.31
$0.31
$0.01
($0.00)
15-213, F06
Summary
IEEE Floating Point Has Clear Mathematical Properties
Violates associativity/distributivity
Makes life difficult for compilers & serious numerical
applications programmers
34
15-213, F06