Professional Documents
Culture Documents
Float Practice Problem Sheet 1602094638087
Float Practice Problem Sheet 1602094638087
3. True or false.
(i) Floating point addition is commutative.
(ii) Floating point addition is associative.
Given a = 1234.567, b = 45.67840 and c = 0.0004. Check if associative property holds or not (use
7-digit mantissa decimal arithmetic).
4. Consider the floating point system β = 10 and t = 4. Find f l(x − y), given that
x = 2.552 × 103 and y = 2.551 × 102 .
Verify that
f l(x − y) − (x − y) 1 −t+1
≤ β ,
(x − y) 2
where f l(x − y) is a t−digit rounded approximation to (x − y).
5. Consider a floating point arithmetic system with base β = 2 and t = 3 digits in the sig-
nificand. Compute (21 × 0.100 − 20 × 0.111) .
6. Consider the representation of floating point numbers in IEEE 754 standard. Find
(i) The largest number in magnitude representable for single precision.
(ii) The largest number in magnitude representable for double precision.
(iii) The smallest number in magnitude representable for single precision.
(iv) The smallest number in magnitude representable for double precision.
∑
n
8. Suppose we compute the sum wj using floating point arithmetic with unit roundoff u. Then
j=1
∑
n ∑
n
fl wj = wj (1 + γj ),
j=1 j=1
where |γj | ≤ (n − 1)u + O(u2 ), regardless of the order in which the terms are accumulated.