Approximation and Round-Off Errors: Speed: 48.X Mileage: 87324.4X

Chapter 3.
Approximation and Round-Off Errors
Speed: 48.X Mileage: 87324.4X
Significant Figures
Number of significant figures indicates precision. Significant digits of a number are those that can be used with confidence, e.g., the number of certain digits plus one estimated digit.
53,800 How many significant figures? 5.38 x 104 5.380 x 104 5.3800 x 104 3 4 5
Zeros are sometimes used to locate the decimal point not significant figures. 0.00001753 0.0001753 0.001753 4 4 4
Approximations and Round-Off Errors

For many engineering problems, no analytical solutions. Numerical methods yield approximate results. We cannot exactly compute the errors associated with numerical methods.
Only rarely given data are exact, since they originate from measurements. Therefore there is probably error in the input information. Algorithm itself usually introduces errors as well, e.g., unavoidable round-offs, etc The output information will then contain error from both of these sources.
How confident we are in our approximate result? The question is how much error is present in our calculation and is it tolerable?
Accuracy How close is a computed or measured value to the true value Precision (or reproducibility) How close is a computed or measured value to previously computed or measured values. Inaccuracy (or bias) A systematic deviation from the actual value. Imprecision (or uncertainty). Magnitude of scatter.
Fig 3.2
Error Definitions
True Value = Approximation + Error Et = True value Approximation (+/-)

True error
true error True fractional relative error = true value

true error True percent relative error, t = 100% true value
For numerical methods, the true value will be known only when we deal with functions that can be solved analytically (simple systems). In real world applications, we usually not know the answer a priori. Then
Approximate error a = Approximation

Iterative approach, example Newtons method Current approximation - Previous approximation a = Current approximation (+ / -)
Use absolute value. Computations are repeated until stopping criterion is satisfied.
a s
Pre-specified % tolerance based on the knowledge of your solution
If the following criterion is met
s = 0.5 10-n = (0.5 10(2-n) )%

you can be sure that the result is correct to at least n significant figures.
Fig 3.3 decimal, binary
Fig 3.4 Signed binary
10
1000 0000 0000 0001 = (-1) 2s complement 0000 0000 0000 0001 = 1 0000 0000 0000 0000 = 0 1111 1111 1111 1111 = -1 1111 1111 1111 1110 = -2 Number range ?, How to compute a from a ?
Fractional number decimal, binary

123.456 = 1 10 2 + 2 101 + 3 10 0 + 4 10 1 + 5 10 2 + 6 10 3
11
101.011 = 1 2 2 + 0 21 + 1 20 + 0 2 1 + 1 2 2 + 1 2 3 = 4 + 1 + 0.25 + 0.125 = 5.375
Fixed point number
fraction
Floating point number (base-10) 156.78 0.15678x103 in a floating point base-10 system
12
1 = 0.029411765 Suppose only 4 34 decimal places to be stored 1 0.0294100 m <1 10
Normalized to remove the leading zeroes. Multiply the mantissa by 10 and lower the exponent by 1 0.2941 x 10-1
Additional significant Chapter 3 figure is retained
Floating point number (binary, base-2)
13
101.011 = 0.101011 23 0.00101101 = 0.101101 2

2
Fig 3.5
Floating point number Numbers such as , e, or 7 cannot be expressed by a fixed number of significant figures. Computers use a base-2 representation, they cannot precisely represent certain exact base-10 numbers. Fractional quantities are typically represented in computer using floating point form, e.g.,
14
m be
mantissa
exponent Base of the number system used
Floating point number

1 m <1 b
15
Therefore for a base-10 system for a base-2 system
0.1 m<1 0.5 m<1
Floating point representation allows both fractions and very large numbers. However,
Floating point numbers take up more room. Take longer to process than integer numbers. not anymore Round-off errors are introduced because mantissa holds only a finite number of significant figures.
Chopping, Rounding Example: =3.14159265358 to be stored on a base-10 system carrying 7 significant digits. =3.141592 chopping error t=0.00000065 If rounded =3.141593 t=0.00000035 Some machines use chopping, because rounding adds to the computational overhead. Since number of significant figures is large enough, resulting chopping error is negligible.
16
Fig 3.6 - example
17
Example 3.4 (p. 61)

2-3 (1 2-1+ 0 2-2 +0 2-3) =0.062500 (the smallest) 2-3(1 2-1 + 0 2-2 +1 2-3) =0.078125 2-3(1 2-1 + 1 2-2 +0 2-3) =0.093750 2-3(1 2-1 + 1 2-2 +1 2-3) =0.109375 Evenly spaced by 2-3(0 2-1 + 0 2-2 +1 2-3)= 0.015625 2-2 (1 2-1+ 0 2-2 +0 2-3) =0.125000 2-2(1 2-1 + 0 2-2 +1 2-3) =0.156250 2-2(1 2-1 + 1 2-2 +0 2-3) =0.187500 2-2(1 2-1 + 1 2-2 +1 2-3) =0.218750 Evenly spaced by 2-2(0 2-1 + 0 2-2 +1 2-3) =0.03125
18
Example 3.4 (p. 61)

22 (1 2-1+ 0 2-2 +0 2-3) =2 22(1 2-1 + 0 2-2 +1 2-3) =2.5 22(1 2-1 + 1 2-2 +0 2-3) =3 22(1 2-1 + 1 2-2 +1 2-3) =3.5 Evenly spaced by 22(0 2-1 + 0 2-2 +1 2-3)= 0.5 23 (1 2-1+ 0 2-2 +0 2-3) =4 23(1 2-1 + 0 2-2 +1 2-3) =5 23(1 2-1 + 1 2-2 +0 2-3) =6 23(1 2-1 + 1 2-2 +1 2-3) =7 (the largest) Evenly spaced by 23(0 2-1 + 0 2-2 +1 2-3) =1
19
Fig 3.7
20
IEEE Standard 754 Floating Point Numbers

Single precision (32-bit) sign: 1 bit, exponent: 8 bits, mantissa: 23 bits 7 significant base-10 digits with range 10^-38 to 10^39 Double precision (64-bit) sign: 1 bit, exponent: 11 bits, mantissa: 52 bits 15-16 significant base-10 digits with range 10^-308 to 10^308
21
Arithmetic Manipulations
22
Common Arithmetic operations The mantissa of the number with the smaller exponent is modified so that the exponents are the same 0.1557 101 + 0.4381 10-1
0.1557 10 0.438110
1 1
0.1557
101 0.1600 101
0.004381 101 0.160081 101
Subtraction 0.3641 102 - 0.2686 102
23
0.364110
2 2
2 0.3641 10
0.2686 10
0.2686 102 2 0.0955 10
0.9550 10
24
Subtraction 0.7642 103 - 0.7641 103
0.7642 10
3 3
3 0.7642 10
0.7642 10
0.7641103 3 0.000110
0.1000 10
25
Multiplication 0.1363 103 0.6423 103
0.1363 103 0.6423 101
0.1363 103 0.6423 101 0.08754549 102 0.8754549 10

1
Exponents are added Mantissas are multiplied
0.8754 10
Errors
26
Adding a large and a small number 0.4000 104 + 0.1000 10-2
0.4000
10
4 4
0.4000
10
+ 0.000000110
+ 0.0000001104 4 0.400000110
0.4000 10
Errors
27
Subtractive Cancellation If b 2 >> 4ac
b b 2 4ac x= 2a
0.12345678 0.12345??? 0.12345666 0.12345??? 0.00000012 0.00000???
%b b 4ac = b
2
% = " small " b + b

Approximation and Round-Off Errors: Speed: 48.X Mileage: 87324.4X

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Approximation and Round-Off Errors: Speed: 48.X Mileage: 87324.4X

Uploaded by

Copyright:

Available Formats

Chapter 3.

Approximation and Round-Off Errors

Speed: 48.X Mileage: 87324.4X

Approximations and Round-Off Errors

True Value = Approximation + Error Et = True value Approximation (+/-)

true error True fractional relative error = true value

Approximate error a = Approximation

Pre-specified % tolerance based on the knowledge of your solution

If the following criterion is met

s = 0.5 10-n = (0.5 10(2-n) )%

Fig 3.3 decimal, binary

Fig 3.4 Signed binary

Fractional number decimal, binary

101.011 = 1 2 2 + 0 21 + 1 20 + 0 2 1 + 1 2 2 + 1 2 3 = 4 + 1 + 0.25 + 0.125 = 5.375

Fixed point number

1 = 0.029411765 Suppose only 4 34 decimal places to be stored 1 0.0294100 m <1 10

Floating point number (binary, base-2)

101.011 = 0.101011 23 0.00101101 = 0.101101 2

exponent Base of the number system used

Floating point number

Therefore for a base-10 system for a base-2 system

0.1 m<1 0.5 m<1

Fig 3.6 - example

Example 3.4 (p. 61)

Example 3.4 (p. 61)

IEEE Standard 754 Floating Point Numbers

101 0.1600 101

0.004381 101 0.160081 101

0.2686 102 2 0.0955 10

Subtraction 0.7642 103 - 0.7641 103

Multiplication 0.1363 103 0.6423 103

0.1363 103 0.6423 101

0.1363 103 0.6423 101 0.08754549 102 0.8754549 10

Exponents are added Mantissas are multiplied

Adding a large and a small number 0.4000 104 + 0.1000 10-2

Subtractive Cancellation If b 2 >> 4ac

% = " small " b + b

You might also like