Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

NUMERICAL ANALYSIS

Lecture 2: Floating point arithmetic and Error Analysis

Rashidah Kasauli Namisanvu

Department of Networks
College of Computing and Information Sciences
Makerere University
rashidah.namisanvu@mak.ac.ug

Rashidah Kasauli Namisanvu (Mak) Errors 1 / 22


Overview

1 Numbers Representation in Computer

2 Introduction to Errors

3 Error Sources

4 Additions & Subtractions

5 Multiplication & Division

6 Using Binomial Theorem

Rashidah Kasauli Namisanvu (Mak) Errors 2 / 22


Numbers in a computer

Humans use decimal number system in daily life.


Most computers use binary number system (base 2)
We enter in decimal and computer transforms them to binary system
by using machine language.
(Scientific Notation). Let k be a real number, then k can be written
in the following form
k = m × 10n
where m (mantissa) is any real number and the exponent n is an
integer. This notation is called the scientific notation or scientific
form and sometimes referred to as standard form.

Rashidah Kasauli Namisanvu (Mak) Errors 3 / 22


Numbers in a computer: Examples

1 0.00000834 = 8.34 × 10−6


2 25.45879 = 2.545879 × 101
3 3400000 = 3.4 × 106
4 33 = 3.3 × 101
5 2,300,000,000 = 2.3 × 109

Rashidah Kasauli Namisanvu (Mak) Errors 4 / 22


Errors

Errors are unavoidable in the field of scientific computing.


Numberical analysts try to investigate the possible and best ways to
minimise the error.
The study of error and how to estimate and minimise it are the
fundamental issues in error analysis

Rashidah Kasauli Namisanvu (Mak) Errors 5 / 22


Errors

Not all Mathematical problems can be solved analytically


2x√ + 5 = 8 easy to solve
2 x + 5 = 8 not as easy
Hence use numerical rather than analytical approaches
In numerical analysis, we approximate the exact solution of the
problem by using numerical methods which introduces an error.
The numerical error is the difference between the exact solution and
the approximate solution.
(Numerical Error). Let x be the exact solution of the underlying
problem and x ∗ its approximate solution, then the error (denoted by
e) in solving this problem is

e − x − x∗

Rashidah Kasauli Namisanvu (Mak) Errors 6 / 22


Errors

The term error represents the imprecision and inaccuracy of a


numerical computation.
Measurements and calculations can be characterized with regard to
their accuracy and precision.
Unavoidable but needs to be tracked
As computations get complex so are the errors
Hence study how they propagate
Accuracy
How close the value is to the true value
Precision
How close the values are to each other

Rashidah Kasauli Namisanvu (Mak) Errors 7 / 22


Rounding and Truncation Errors

Rounding
Occurs due to computing device’s inability to deal with inexact
numbers. π = π = 3.14159265 · · · can be rounded to four significant
digits as π = 3.142
Maximum error in rounding is 0.5 × 10−n . n is the number of rounded
places
Truncating
Occurs because some series (finite or infinite) is truncated/choped to a
fewer number of terms. e.g. A surd π = 3.14159265 · · · can be
truncated to four significant figures to π = 3.141
Maximum error in truncation is 1 × 10−n .

Rashidah Kasauli Namisanvu (Mak) Errors 8 / 22


Error Propagation

Calculating the uncertainty or error of an approximation against the


actual value it is trying to approximate. Represented as absolute
value showing how far the approximation is or relative error shown as
a proportion
Absolute Error
δx = |x − x ′ |
Relative Error

δx
RE = x = | x−x
x |
Percentage Error

PE = RE × 100 = | x−x
x | × 100
Which is better?

Rashidah Kasauli Namisanvu (Mak) Errors 9 / 22


Errors in Addition & Subtraction

Theorem (Errors in Addition)


For variables x1 ± δx1 and x2 ± δx2 where δxi is the maximum possible
error. The maximum possible error when they are added is (δx1 + δx2 ).

Proof.
Since δx1 and δx2 are ±, the possible combinations on addition are
(δx1 + δx2 ), (δx1 − δx2 ), (−δx1 + δx2 ) and (−δx1 − δx2 ). The extremes
are the first and last. Hence the sum ∈ ±(δx1 + δx2 )

Theorem (Errors in Subtraction)


For variables x1 ± δx1 and x2 ± δx2 where δxi is the maximum possible
error. The maximum possible error when they are subtracted is (δx1 + δx2 ).

Proof.
Same as addition
Rashidah Kasauli Namisanvu (Mak) Errors 10 / 22
Errors in Multiplication & Division

Theorem (Errors in Multiplication and Division)


The relative error in division and multiplication are the same.

Proof.
Consider x1 ± δx1 and x2 ± δx2 . For multiplication, the absolute error is
(x1 + δx1 )(x2 + δx2 ) − x1 x2 = x1 δx2 + x2 δx1 (Note: δx1 δx2 tends to zero).
The relative error is x1 δxx21+x
x2
2 δx1
= δx 1 δx2
x1 + x2 . For division, the absolute
x2 −δx2 x2 δx1 −x1 δx2
error is xx12 +δx x1 x1 +δx1 x1
+δx2 − x2 ≡ x2 +δx2 × x2 −δx2 − x2 which simplifies to
1
x2
.
2
x2 δx1 −x1 δx2 x1
The relative error is x22
/ x2 which simplifies to
δx1 δx2 δx1 δx2
x1 − x2 ≡ x1 + x2 since errors are ±

Rashidah Kasauli Namisanvu (Mak) Errors 11 / 22


Generic use of Binomial Theorem

We have looked at errors in simple cases of


addition/subtraction/multiplication/division
We may need error expressions for fairly complex cases: roots,
reciprocals, big powers (eg x 25 ), etc
Generally we employ the binomial theorem for any index
Approximate higer powers of δx to be very small

Theorem (Binomial theorem for any index n)


n(n − 1) 2 n(n − 1)(n − 2) 3
(1 + p)n = 1 + np + p + p + ······
2! 3!
All we do is to write the error expression as (1 + α)k and simplify

Rashidah Kasauli Namisanvu (Mak) Errors 12 / 22


Some samples

power 100
If the error in x is δx, the error in x 100 can be got by (x + δx)100 − x 100 .
This can be rewritten as x 100 (1 + δxx )
100 − x 100 = x 100 ((1 + ( δx ))100 − 1).
x
Expanding to x 100 1 + 100( δx 100×99 δx 2 100×99×98 δx 3
 
x ) + 2! ( x ) + 3! ( x ) ··· − 1
and simplifying to x 100 × 100( δxx ) ≡ 100x δx
99

Try convince your self


That the error for the
δx
square root of x is √
2 x
δx
reciprocal of x is x2
2δx
square of the cuberoot of x is √
33x

Rashidah Kasauli Namisanvu (Mak) Errors 13 / 22


Examples

Given the numbers a = 2.3,


1 Maximum error in a + b
b = 4.65 and c = 2.10. a and b = 0.05 + 0.005 = 0.055
were generated by rounding while 2 Max error in a + b − c
c by truncating. Find the = 0.05+0.005+0.01 = 0.065
maximum possile error in 3 Error in
1 a+b multiplication
= ac( δa δc
a + c ) = 0.128
2 a+b−c addition
ac
3
b+c
= δb + δc = 0.015
division
4 (b − c)23 ac
= b+c × ( δ(ac) δ(b+c)
ac + b+c ) =
The maximum possible errors δa, 0.026
δb and δc are 0.5 × 10−1 , 4 Error in
0.5 × 10−2 and 1 × 10−2 b − c = δb + δc = 0.015
respectvely (b − c)23 = 23(b −
c)22 δ(b − c) = 5.02 × 107
Rashidah Kasauli Namisanvu (Mak) Errors 14 / 22
Recap – Errors

1 Numerical Analysis
Definition: Study of algorithms for the problems of continuous
mathematics.
We care about efficiency and accuracy of algorithms.
Sources of error:
truncation errors in mathematical models;
round-off error during calculations;
representation errors, e.g.:1/3 in decimal; 1/10 in binary.
We ask two important questions:
How sensitive is the output to changes in the input? – problem
conditioning
Do roundoff errors in an algorithm propagate? – algorithm stability
2 Floating point representation
Hardware supports single (32bit), double (64bit) and long double
(80bit) precision
Adding floating point numbers

Rashidah Kasauli Namisanvu (Mak) Errors 15 / 22


Adding two floating point numbers
Consider two numbers in a system with 4 as the size of fraction:

x = 0.1101x23 and y = 0.1011x21

. Four steps to add two floating-point numbers:


1 Align the numbers so they both have the same exponent.

y = 0.1011x21 = 0.01011x22 = 0.001011x23

0.1101 x 23
2 Perform the addition. + 0.001011 x 23
0.111111 x 23
3 Round the result: x + y = 1.0000x23 .
4 Normalize the result: x + y = 0.1000x24 .
Exercise 1: check the accuracy of the sum. Translate all operations into
our familiar decimal notation.
Rashidah Kasauli Namisanvu (Mak) Errors 16 / 22
Loss of Significance

Consider two numbers in a system with 4 as the size of the fraction:


x = 0.1110x23 and y = 0.1101x23
Compute x − y :
0.1110 x 23
− 0.1101 x 23
0.0001 x 23
After normalisation: x − y = 0.1000x20 .
Problem: x and y have 4 bits of significance
the result x − y has only one significant bit of accuracy
Using truncated Taylor series helps avoid loss of significance in
quadratic formulae.
For polynomial evaluation, the rearrangement of terms into nested
multiplication form will sometimes produce a better result.

Rashidah Kasauli Namisanvu (Mak) Errors 17 / 22


Approximations

Decimal places and significant figures.


The number x̂ is said to approximate x to d significant digits if d is
the largest positive integer for which

|x − xb| 10−d
<
|x| 2

Example:
If x = 3.141592 and x̂ = 3.14, then |x − x̂|/|x| = 0.000507 < 10−2 /2.
Therefore, x̂ approximates x to two significant digits.
If y = 4, 000, 000 and ŷ = 000, 006, then
|y − ŷ |/|y | = 0.000004 < 10−5 /2. Therefore ŷ approximates y to five
significant digits.
If z = 0.000012 and ẑ = 0.000009, then |z − ẑ|/|z| = 0.25 < 100 /2.
Therefore ẑ approximates z to no significant digits.

Rashidah Kasauli Namisanvu (Mak) Errors 18 / 22


Uncertainty in Data

Real world data contains uncertainty or error.


reffered to as noise
Successive computations using noisy data does not improve precision
Result and inputs should have the same number of significant digits
of accuracy.
Example: p1 = 4.152 and p2 = 0.07931 both have 4 significant digits
of accuracy.
Reporting p1 + p2 = 4.23131 includes noisy data.
Proper answer is p1 + p2 = 4.231

Rashidah Kasauli Namisanvu (Mak) Errors 19 / 22


Propagation of error

Error in addition and subtraction is the sum of the errors in the


addends
The relative error in multiplication and division is the sum of the
relative errors in the approximations.
If a small error in the initial conditions produces a small error in the
final result, then the algorithm is stable. Otherwise it is unstable.
Suppose ϵ represents an initial error and ϵ(n) represents the growth of
error after n steps
If ϵ(n) ≈ nϵ the growth of error is said to be linear.
If ϵ(n) ≈ K n ϵ the growth of error is exponential.

Rashidah Kasauli Namisanvu (Mak) Errors 20 / 22


Exercises
1 Find the error E and the relative error R. Also determine the number
of significant digits in the approximation.
x = 2.71828182, x̂ = 2.7182
y = 98.350, ŷ = 98.000
z = 0.000068, ẑ = 0.00006
2 Consider the data p1 = 1.414 and p2 = 0.09125 which have four
significant digits of accuracy. Determine the proper answer for the
sum p1 + p2 and the product p1 p2
3 Consider the data p1 = 31.415 and p2 = 0.027182which have five
significant digits of accuracy. Determine the proper answer for the
sum p1 + p2 and the product p1 p2
4 Let P(x) = x 3 − 3x 2 + 3x − 1, Q(x) = ((x − 3)x + 3)x − 1 and
R(x) = (x − 1)3
Use four-digit rounding arithmetic and compute P(2.72), Q(2.72) and
R(2.72). In the computation of P(x) assume that (2.72)3 = 20.12 and
(2.72)2 = 7.398. Compare them to the true values.
Rashidah Kasauli Namisanvu (Mak) Errors 21 / 22
The End

Rashidah Kasauli Namisanvu (Mak) Errors 22 / 22

You might also like