Professional Documents
Culture Documents
Chapter 1: Errors in Numerical Computations 1.: F Inf M F Sup M
Chapter 1: Errors in Numerical Computations 1.: F Inf M F Sup M
Chapter 1: Errors in Numerical Computations 1.: F Inf M F Sup M
1. Introduction
In this chapter, we first begin with some mathematical preliminary theorems that are
invariably useful concerning the study of numerical analysis. This chapter presents the
various kinds of errors that may occur in a problem. The representation of numbers in
computation are also given. Finally, the concepts of stability and conditioning of problems
2. Mathematical preliminaries
Let f (x) be a real valued continuous function on the finite interval [a, b] and define
Then for any number µ in [m, M ] , there exists at least one point ξ in [a, b] for which
f (ξ ) = µ .
Let f (x) be a real valued continuous function on the finite interval [a, b] and differentiable in
Let w(x ) be nonnegative and integrable on [a , b] , and let f (x) be continuous on [a , b] . Then
b b
One of the most important and useful tools for approximating functions f (x ) by polynomials,
in numerical analysis is Taylor’s theorem and the associated Taylor series. These
Let f (x) be a real valued continuous function on the finite interval [ a , b] and have n + 1
continuous derivatives on [a, b] for some n≥0, and let x, x0 ∈ [a, b] . Then
f ( x) = Pn ( x) + Rn+1 ( x) ,
where
( x − x0 ) ( x − x0 ) n ( n )
Pn ( x) = f ( x0 ) + f ′( x0 ) + ... + f ( x0 )
1! n!
( x − x0 ) n +1 ( n +1)
Rn+1 ( x) = f (ξ ) , a<ξ <b.
(n + 1)!
2
2 n−1
∂ ∂ 1 ∂ ∂ 1 ∂ ∂
f (a + h, b + k ) = f (a , b) + h + k f (a, b) + h + k f (a, b) + ... + h + k f ( a, b )
∂x ∂y 2! ∂x ∂y ( n − 1)! ∂x ∂y
+ Rn+1 ( x ) ,
where
n
1 ∂ ∂
Rn ( x ) = h + k f (a + θh, b + θk ) , 0 <θ <1.
n! ∂x ∂y
Rn (x ) is called the remainder after n terms and the theorem is called, Taylor’s theorem with
Significant figures:
In the number 0.00134, the significant figures are 1, 3, 4. The zeros are used here merely to
fix the decimal point and therefore not significant. But in the number 0.1204, the significant
Rule 2: Zeros between non-zero digits are significant, e.g., In reading the measurement 9.04
cm, the zero represents a measured quantity, just as 9 and 4 and is, therefore, a significant
number. Similarly in another example, there are four significant numbers in the number 1005.
3
Rule 3: Zeros to the left of the first non-zero digit in a number are not significant, e.g.,
0.0026. Also, in the measurement 0.07 kg, the zeros are used merely to locate the decimal
Rule 4: When a number ends in zeros that are to the right of the decimal point, then zeros are
significant, e.g., in the number 0.0200, there are three significant numbers. Another example
is that in reading the measurement 11.30 cm, the zero is an estimate and represents a
measured quantity. It is therefore significant. Thus, zeros to the right of the decimal point and
Rule 5: When a number ends in zeros that are not to the right of the decimal point, then zeros
are not necessarily significant, e.g., if a distance is reported as 1200 feet, one may assume
two significant figures. However, reporting measurements in scientific notation removes all
doubt, since all numbers are written in scientific notation are considered significant.
Thus, we may conclude that if a zero represents a measured quantity, it is a significant figure.
The following is the general rule for rounding off a number to n significant digits:
4
Discard all digits to the right of the n th place. If the discarded number is less than half a unit
in the n th place, leave the n th digit unchanged; if the discarded number is greater than half a
unit in the n th place, add 1 to the n th digit. If the discarded number is exactly half a unit in
the n th place, leave the n th digit unaltered if it is an even number, but increase it by 1 if it is
an odd number.
When a number is rounded off according to the rule just stated above, then it is said to be
We now proceed to present the classification of the ways by which error is involved into the
numerical computation. Let us start with some simple definitions about error.
Let xT be the exact value or true value of a number and x A be its approximate value, then
Ea ≡ xT − x A
5
The relative error is defined by
xT − x A
Er ≡ , provided xT ≠ 0 or xT is not close to zero.
xT
xT − x A
E p ≡ E r × 100 = × 100 , provided xT ≠ 0 or xT is not close to zero.
xT
The inherent error is that quantity which is already present in the statement of the problem
before its solution. The inherent error arises either due to the straight assumptions in the
mathematical forms of the problem or due to the physical measurements of the parameters of
problem. Inherent error can not be completely eliminated but can be minimized if we select
satisfies
1
x − x (d ) ≤ × 10 −d . (4.1)
2
The error arising out of rounding of a number, as defined in eq. (4.1) is known as round-off
error.
Let an arbitrary given real number x which has the representation in the following form
6
where b is the base, d1 ≠ 0 , d 2 , …, d k are integers and satisfies 0 ≤ d i ≤ b − 1 and the exponent
e is such that emin ≤ e ≤ emax . The fractional part .d1d 2 ...d k d k +1 ... is called the mantissa and it lies
between -1 and 1.
Now, the floating-point number fl ( x) in k -digit mantissa standard form can be obtained in
(a) Chopping: In this case, we simply discard the digits d k +1 , d k +2 , … in eq. (4.2), and
obtain
nearest to x , together with the rule of symmetric rounding, according to which, if the
truncated part be exactly half a unit in the k -th position, then if the k -th digit be odd,
Thus the relative error for k -digit mantissa standard form representation of x becomes
Therefore, the bound on the relative error of a floating point number is reduced by half when
rounding is used instead of chopping. For this reason, on the most of the computers rounding
is used.
Example 1:
Approximate values of 1 and 1 , correct to 4 decimal places are 0.1667 and 0.0769
6 13
respectively. Find the possible relative error and absolute error in the sum of 0.1667 and
0.0769.
7
Solution:
1
× 10 −4 = 0.00005
2
( x + y )T − ( x + y ) A E ( x) E ( y) 0.00005 0.00005
E r [( x + y ) A ] = ≤ a + a ≤ + ≤ 0.000950135
( x + y )T ( x + y ) T ( x + y )T 0.1667 0.0769
Example 2:
If the number π = 4 tan −1 (1) is approximated using 5 significant digits, find the percentage
(i) Chopping,
(ii) Rounding.
Solution:
8
π − 3.1415
× 100 = 0.00294926%
π
π − 3.1416
× 100 = 0.000233843%
π
From the above errors, it may be easily observed that rounding reduces error.
5. Truncation errors
These are the errors due to approximate formulae used in the computations. Truncation errors
result from the approximate formulae used which are generally based on the truncated series.
The study of this error is usually associated with the problem of convergence.
For example, let us assume that a function f (x) and all its higher order derivatives with
respect to the independent variable x at the point, say x = x0 are known. Now in order to
find the function value at a neighbouring point, say x = x0 + ∆x , one can use the Taylor series
( x − x0 ) 2
f ( x ) = f ( x0 ) + ( x − x0 ) f ′( x0 ) + f ′′( x0 ) + ... (5.1)
2!
the right hand side of the above equation is an infinite series and one has to truncate it
after some finite number of terms to calculate f ( x0 + ∆x) either with computer or by manual
calculations.
If the series is truncated after n terms, then it is equivalent to approximating f (x) with a
9
( x − x0 ) 2 ( x − x0 ) n−1 (n−1)
f ( x ) ≈ Pn−1 ( x) = f ( x0 ) + ( x − x0 ) f ′( x0 ) + f ′′( x0 ) + ... + f ( x0 ) (5.2)
2! ( n − 1)!
( x − x0 ) n ( n )
ET ( x ) = f ( x ) − Pn−1 ( x) = f (ξ ) (5.3)
n!
Now, let
M n ( x) = max f ( n) (ξ ) (5.4)
a ≤ξ ≤ x
n
M n ( x ) x − x0
ET ( x) ≤ (5.5)
n!
Hence, from eq. (5.2) an approximate value of f ( x0 + ∆x) can be obtained with the truncation
Example 3:
f ( x) = (1 + x) 2 / 3 , x ∈ [0,0.1]
Using Taylor series expansion about x = 0 . Use the expansion to approximate f (0.04) and
Solution:
10
We have,
f ( x) = (1 + x) 2 / 3 , f (0) = 1
2 2
f ′( x ) = , f ′(0) =
3(1 + x )1/ 3 3
2 2
f ′′( x) = − 4/ 3
, f ′′(0) = −
9(1 + x) 9
8
f ′′′( x ) = .
27(1 + x) 7 / 3
Thus, the Taylor series expansion with the remainder term is given by
2 x2 4 x3
(1 + x ) 2 / 3 = 1 + x− + , 0 < ξ < 0.1
3 9 81 (1 + ξ ) 7 / 3
2 x2 4 x3
ET ( x) = (1 + x ) 2 / 3 − 1 + x − =
3 9 81 (1 + ξ ) 7 / 3
2 (0.06) 2
f (0.04) ≈ 1 + (0.06) − = 1.026488888889 , correct to 12 decimal places.
3 9
4 (0.1) 3
ET ≤ max
0≤ x ≤ 0.1 81 (1 + x) 7 / 3
4
≤ (0.1) 3 = 0.493827 × 10 −4
81
11
The exact value of f ( 0.04 ) correct upto 12 decimal places is 1.026491977549.
± .d1d 2 ...d k × b e ,
number of significant digits or bits, which indicates the precision of the number and the
exponent e is such that emin ≤ e ≤ emax . The fractional part .d1d 2 ...d k d k +1... is called the mantissa
In numerical computation, nowadays usually digital computers are used. Most of the digital
The fundamental unit of data stored in a computer memory is called computer word. The
number of bits a word can hold is called word length. The word length is fixed for a computer
although it varies from computer to computer. The typical word lengths are 16, 32, 64 bits or
higher bits. The largest number that can be stored in a computer depends on word length. To
store a number in floating point representation, a computer word is divided into three fields.
The first part consists of one bit, called the sign bit. The next bits represent the exponent and
finally the bits represent the mantissa. For example, in single-precision floating-point format,
a 32-bit word is divided into three fields as follows: 1 bit for the sign, 8 bits for the exponent
and 23 bits for the mantissa. Exponent is an 8 bit signed integer from −128 to 127. On the
other hand, in double-precision floating-point format, a 64-bit word is divided into three
fields as follows: 1 bit for the sign, 11 bits for the exponent and 52 bits for the mantissa.
12
In the normalized floating point representation, the exponent is so adjusted that the bit d1
immediately after the binary point is always 1. Formally, a nonzero floating point number is
1
≤ mantissa < 1 .
b
The range of the exponents that a typical computer can handle is very large. The following
table shows the effective range of IEEE (Institute of Electrical and Electronics Engineers)
floating-point numbers:
If in a numerical computation, a number lies outside the range, then the following cases arise:
(a) Overflow: It occurs when the number is larger than the above range specified in
Table 1.
13
(b) Underflow: It occurs when the number is smaller than the above range specified in
Table 1.
In case of underflow, the number is usually set to zero and computation continues. But in
7. Propagation of errors
In this section, we consider the effect of arithmetic operations which involve errors. Let, x A
and y A be the approximate numbers used in the calculations. Suppose they are in error with
Case 1: (Multiplication)
xT yT − x A y A = xT yT − ( xT − ε x )( yT − ε y )
= xT ε y + yT ε x − ε xε y .
xT yT − x A y A
Er ( xA yA ) =
xT yT
εx εy εx εy
= + − .
xT yT xT yT
≤ E r ( x A ) + E r ( y A ) , provided E r ( x A ) , Er ( y A ) << 1
Case 2: (Division)
14
Proceeding with same argument as in multiplication, we get
E r ( x A / y A ) ≤ Er ( x A ) + Er ( y A ) , provided Er ( y A ) << 1
( xT ± yT ) − ( x A ± y A ) = ( xT − x A ) ± ( yT − y A ) = ε x ± ε y .
Ea ( x A ± y A ) ≤ Ea ( x A ) + Ea ( y A ) .
Notes:
(i) The relative error in a product is bounded by the sum of the relative errors in the
multiplicands; and the relative error in a quotient is bounded by the sum of the
relative errors in the dividend and divisor. The relative errors in multiplication or
(ii) The absolute error in the sum or difference of two numbers is bounded by the sum
u = f ( x1 , x2 ,..., xn ) (8.1)
Suppose that, ∆xi represents error in each xi , so that the error in u is given by
15
u + ∆u = f ( x1 + ∆x1 , x2 + ∆x2 ,..., xn + ∆x n ) (8.2)
Taylor series expansion of the right hand side of eq. (8.2) gives
n
∂f
u + ∆u = f ( x1 , x2 ,..., xn ) + ∑ ∂x ∆x + O(∆x
i =0 i
i
2
i ) (8.3)
If we assume that the errors ∆x1 , ∆x2 , …, ∆xn are relatively very small, we can neglect the
second and higher powers of ∆xi . Thus from eq. (8.3), we get
n
∂f ∂f ∂f ∂f
∆u ≅ ∑ ∂x ∆x
i =0 i
i =
∂x1
∆x1 +
∂x2
∆x2 + ... +
∂xn
∆x n (8.4)
This is the general formula for computing the error of a function u = f ( x1 , x2 ,..., xn ) .
∆u ∂f ∆x1 ∂f ∆x2 ∂ f ∆x n
Er = ≅ + + ... + .
u ∂x1 f ∂x2 f ∂xn f
Example 4:
3
If u = xyz 3 + x 2 y 3 and errors in x , y , z are 0.005, 0.001, 0.005 respectively at x = 2 , y = 1 ,
2
Solution:
Let
3 2 3
u = f ( x, y , z ) = xyz 3 + x y
2
We have
16
∂f ∂f 9 ∂f
= yz 3 + 3xy 3 , = xz 3 + x 2 y 2 , = 3xyz 2
∂x ∂y 2 ∂z
∂f ∂f ∂f 9
∆u ≅ ∆x + ∆y + ∆z = ( yz 3 + 3 xy 3 )∆x + ( xz 3 + x 2 y 2 ) ∆y + 3 xyz 2 ∆z
∂x ∂y ∂z 2
9 2 2
∆u ≤ ( yz 3 + 3 xy 3 )∆x + ( xz 3 + x y ) ∆y + 3 xyz 2 ∆z
2
∆u 0.085
(Er )max = max ≈ = 0.010625
u 8
17