Numerical Analysis 1

ICS226/ISE225: Numerical Analysis
Oliver Mhlanga
HARARE INSTITUTE OF TECHNOLOGY

omhlanga@hit.ac.zw or olivamhlanga@gmail.com; 0712531415
June 23, 2021
Error and Computer Arithmetic

Computer arithmetic and number storage
Numerical Analysis: This refers to the analysis of mathematical

problems by numerical means, especially mathematical problems
arising from models based on calculus.
Effective numerical analysis requires several things:
I An understanding of the computational tool being used, be
it a calculator or a computer.
I An understanding of the problem to be solved.
I Construction of an algorithm which will solve the given
mathematical problem to a given desired accuracy and
within the limits of the resources (time, memory, etc) that
are available.
Definition
A mathematical model is a mathematical description of a
physical situation.
The main goal of numerical analysis is to develop efficient
algorithms for computing precise numerical values of mathematical
quantities, including functions, integrals, solutions of algebraic
equations, solutions of differential equations (both ordinary and
partial), solutions of minimization problems, and so on.
Part of this process is the consideration of the errors that arise in
these calculations, from the errors in the arithmetic operations or
from other sources.
Computers use binary arithmetic, representing each number as a

binary number (consisting two bits 0 and 1): a finite sum of
integer powers of 2. Some numbers can be represented exactly, but
others, such as 1/10, 1/100, 1/1000, ... cannot.
For example, 2, 125 = 21 + 2−3 has an exact representation in
binary (base 2), but
3, 1 = 21 + 20 + 2−4 + 2−5 + 2−8 + ... does not.
And, of course, there are the transcendental numbers like π that
have no finite representation in either decimal or binary number
system.
Decimal to binary conversion: For the integer part, we divide by
2 repeatedly (using integer division); the remainders are the
successive digits of the number in base 2, from least to most
significant.
Example 1 Convert (107)10 to a binary number:
Quotients 107 53 26 13 6 3 1 0
Remainders 1 1 0 1 0 1 1
Therefore , (107)10 = (1101011)2
For the fractional part, multiply the number by 2; take away the
integer part, and multiply the fractional part of the result by 2, and
so on; the sequence of integer parts are the digits of the base 2
number, from most to least significant.
Example 2 Convert (0.625)10 to a binary number:
Quotients 0.625 0.25 0.5 0
Remainders 1 0 1
Therefore , (0.625)10 = (0.101)2
Note that if f has an infinite representation in base 10, its

representation in base 2 will also be infinite. However, we could
have situations where f is finitely represented in base 10 and
infinitely represented in base 2.
Octal representation: A binary number can be easily represented
in base 8. Partition the number into groups of 3 binary digits
(23 = 8), from decimal point to the right and to the left (add zeros
if needed). Then, replace each group by its octal equivalent.
Example 3 (107.625)10 = (|1| |101| |011| . |101|)2 = (153.5)8
Hexadecimal representation: To represent a binary number in

base 16 proceed as above, but now partition the number into
groups of 4 binary digits (24 = 16). The base 16 digits are
0, ..., 9, A = 10, ..., F = 15.
Example 4 (107.625)10 = (|0110| |1011| . |1010|)2 = (6B.A)16
Errors in Numerical Computations
In any applied numerical computation, there are three key sources

of error:
1. Inherent errors:
1.1 Inexactness of the mathematical model for the underlying
physical phenomenon (often called Modeling errors).
1.2 Errors in measurements of parameters entering the model
(often called Experimental errors) .
(1.1) is the domain of mathematical modeling, and will not
concern us here. Neither will (1.2), which is the domain of the
experimentalists.
2. Round-off errors: arises due to the finite numerical precision
imposed by the computer.
3. Truncation errors: arises due to approximations used to solve
the full mathematical system.
(3) is the true domain of numerical analysis, and refers to the fact
that most systems of equations are too complicated to solve
explicitly, or, even in cases when an analytic solution formula is
known, directly obtaining the precise numerical values may be
difficult.
Definition
Let x be a real number and let x ∗ be an approximation. The
absolute error in the approximation x ∗ ≈ x is defined as
∆x = |x ∗ − x|. The relative error is defined as the ratio of the
∗ −x|
absolute error to the size of x, i.e., |x |x| = ∆x
|x| , provided x 6= 0;
otherwise relative error is not defined.
If x ∗ < x , then we say that the number x ∗ is an approximate
value of the number x by defect and if x ∗ > x , then it is an
approximate value of x by excess.
The percentage error (a.k.a relative percentage error)of an

approximate number is Relative error × 100%.
NB:Absolute error measures only quantity of error and it is the
total amount of error incurred by approximate value. While the
relative error measures both the quantity and quality of the
measurement. It is the total error while measuring one unit. The
absolute error depends on the measuring unit, but, relative error
does not depend on measuring unit.
Example 7. Find the absolute, relative and percentage error in x ∗
when x = 1/3 and x ∗ = 0.333
Solution. The absolute error:
∆x = |x ∗ − x| = |0.333 − 1/3| = 0.00033.
Relative error : ∆x 0.00033
|x| = 1/3 = 0.001.
Percentage error:Relative error × 100% = 0.001 × 100 = 0.1%.
H/W 1. Determine the absolute error and the exact number

corresponding to the approximate number x ∗ = 5.373 if percentage
error is 0.01%.
H/W 2.An exact number x is in the interval [28.03, 28.08].
Assuming an approximate value, find the absolute and the
percentage errors. [Hint: The middle of the given interval is taken
as its approximate value]
Oliver Mhlanga

June 23, 2021
Solution of Nonlinear Equations

Our goal is to develop simple numerical methods for the

approximate solution of the equation
f (x) = 0 (1)
where f is a real-valued function, defined and continuous on a

bounded and closed interval of the real line.
We will restrict our discussion to the case where x ∈ <, but x ∈ C
can also be of significant interest in practice (especially if f is a
polynomial). Solutions of nonlinear equations are called roots or
zeroes.
Some methods will be considered such the bisection method and
the Newton-Raphson method.
These methods compute a solution of a nonlinear equation starting

from one initial data, then adopting some iterative method that,
under favorable conditions, will converge to a zero of the function
f.
The number of iterations needed to achieve a solution with a given
accuracy is considered. Iterative procedures can break down (i.e.,
fail) in various ways. The sequence generated by a procedure may
diverge, oscillate, or display chaotic behavior ( a.k.a. ’wandering
behavior’).
Theorem
2.1. Let f be a real-valued function, defined and continuous on a
bounded closed interval [a, b] of the real line. Assume, further,
that f (a)f (b) < 0;then, there exists r in [a, b] such that f (r ) = 0.
Bisection Method.
The bisection method is a procedure that repeatedly ’halves’ the

interval in which a root r has been located. This ’halving’ process
is reiterated until the desired accuracy is reached.
Suppose that we are given an interval [a, b] satisfying (theorem
2.1) and an error tolerance > 0. The bisection method consists
of the following steps:
1. Define c = (a + b)/2.
2. If b − c ≤ , then accept c as the root and stop.
3. If f (b) × f (c) ≤ 0, then set a = c. Otherwise, set b = c.
Return to step 1.
The interval [a, b] is halved with each loop through steps 1 to 3.
Step 2 will be satisfied eventually, and with it the condition
|r − c| ≤ will be satisfied.
Bisection Method.
Example 2.1. Find the largest root of
f (x) = x 6 − x − 1 = 0 (2)
accurate to within = 0.001.

With a graph, it is easy to check that 1 < r < 2
We choose a = 1, b = 2; then f (a) = −1, f (b) = 61, and

theorem (2.1) is satisfied.
Bisection Method.
n a b c b-c f(c)
1 1.0000 2.0000 1.5000 0.5000 8.8906
2 1.0000 1.5000 1.2500 0.2500 1.5647
3 1.0000 1.2500 1.1250 0.1250 -0.0977
4 1.1250 1.2500 1.1875 0.0625 0.6167
5 1.1250 1.1875 1.1562 0.0312 0.2333
6 1.1250 1.1562 1.1406 0.0156 0.0616
7 1.1250 1.1406 1.1328 0.0078 -0.0196
8 1.1328 1.1406 1.1367 0.0039 0.0206
9 1.1328 1.1367 1.1348 0.0020 0.0004
10 1.1328 1.1348 1.1338 0.00098 -0.0096
Table 1: Bisection Method for (2)

Bisection Method.
The number (n) of bisections needed to bracket a root given

tolerance are given by
ln( b−a
)
n≥ (3)
ln 2
Bisection Method.
For the previous example (2),we need

1
ln( 0.001 )
n≥ = 9.97
ln 2
i.e., we need n = 10 iterates, exactly the number computed.
Advantages of the Bisection Method
I It is guaranteed to converge.
I The error bound (??) is guaranteed to decrease by one-half
with each iteration
Many other numerical methods have variable rates of decrease for
the error, and these may be worse than the bisection method for
some equations.
Bisection Method.
Disadvantage of the Bisection Method

I generally converges more slowly than most other methods.
For functions f (x) that have a continuous derivative, other
methods are usually faster. These methods may not always
converge; when they do converge, however, they are almost always
much faster than the bisection method.
Newton-Raphson Method.
We are now going to establish general methods for computing a

root of the equation f (x) = 0, where f (x) can be an algebraic or
transcendental function.
If xn is the current estimate, then the next estimate xn+1 is given
by
f (xn )
xn+1 = xn − , n = 0, 1, 2, .... (4)
f 0 (xn )
(4) is referred to as the Newton’s method, or Newton-Raphson,

for solving f (x) = 0.
It is guaranteed to converge if the initial guess x0 is close enough,

but it is hard to make a clear statement about what we mean by
close enough because this is highly problem specific. A sketch of
the graph of f(x) can help us decide on an appropriate initial guess
x0 for a particular problem.
Example 2.2: From example 2.1, Use Newtons method to solve (2).
Here f (x) = x 6 − x − 1, f 0 (x) = 6x 5 − 1
and the iteration is given by
xn6 − xn − 1
xn+1 = xn − (5)
6xn5 − 1
Newton’s Method for x 6 − x − 1 = 0

n xn f (xn ) xn − xn−1 α − xn−1
0 1.5 8.89E+1
1 1.30049088 2.54E+1 -2.00E-1 -3.65E-1
2 1.18148042 5.38E-1 -1.19E-1 -1.66E-1
3 1.13945559 4.92E-2 -4.20E-2 -4.68E-2
4 1.13477763 5.50E-4 -4.68E-3 -4.73E-3
5 1.13472415 7.11E-8 -5.35E-5 -5.35E-5
6 1.13472414 1.55E-15 -6.91E-9 -6.91E-9
7 1.134724138
The true root is α = 1.134724138, and x6 = α to nine significant
digits.
N.B: Newton’s method may converge slowly at first. However, as
the iterates come closer to the root, the speed of convergence
increases.
Convergence analysis
Let α be the root so that f (α) = 0.
f (xk ) f 0 (xk )ek −f (xk )
Define Error: ek+1 = xk+1 − α = xk − f 0 (xk ) −α= f 0 (xk )
Using Taylor’s theorem
|ek+1 | =≡ C |ek |2
This is called quadratic convergence.
Oliver Mhlanga

June 23, 2021
Interpolation and Approximation

Let y = f (x), then given (n + 1) distinct data points

{xi , yi } , i = 0, 1, 2, ....., n. We wish to find a polynomial Pn (x)
such that
Pn (xi ) = yi (6)
Equation (6) is called interpolating polynomial if

dk f ∗ d k Pn
yi = f (xi ), i = 0, 1, 2, ..., n and dx k x = dx k ∗ for some finite
x
k, and x ∗ is one of the values of xi .
Pn (x) = a0 + a1 x + a2 x 2 + ...... + an x n . (7)
(7) has (n + 1) distinct functions, xi s are called nodes.

This creates a system of (n + 1) equations with (n + 1) unknowns.
The aim is to solve (7), and determine the coefficients

a0 , a1 , a2 , ......, an using condition (6). This leads to
Pn (x0 ) = a0 + a1 x0 + a2 x02 + ...... + an x0n = y0

Pn (x1 ) = a0 + a1 x1 + a2 x12 + ...... + an x1n = y1
..... = ....................... = .........
Pn (xn ) = a0 + a1 xn + a2 xn2 + ...... + an xnn = yn
In matrix form
x02 ... x0n

     
1 x0 a0 y0
 1 x1 x12 n
... x1   a1 
  
 y1 
 

 ... ... =
... ...   ...   ... 
1 xn xn2 ... xnn an yn
x02 ... x0n
   
1 x0 a0
 1 x1 x12 ... x1n   a1 
Where X= 
 ... ...
 , A=   and Y=
... ...   ... 
1 xn xn2 ... xnn an
 
y0
 y1 
 
 ... 
yn
The coefficient matrix is called
Q the Vandermonde matrix. It can
be shown that det(X ) = 0≤j<k≤n (xk − xj ). Since xi are distinct,
the Vandermonde system is non-singular, and solving this system
gives the coefficients ai which solves (6).
Other reasons:
I To approximate the function by polynomial interpolation;
I Other computations such as derivatives and definite integrals
of polynomials are easy to determine and are also polynomials;
Example 3.1. Interpolate the given data set with a polynomial of
degree 2.
2
xi 0 1 3
yi 1 0 0.5
Solution: Let P2 (x) = a0 + a1 x + a2 x 2 .
We want to find the coefficients a0 , a1 , a2 . By interpolating
properties we’ve:
x = 0, y = 1 : P2 (0) = a0 = 1
x = 1, y = 0 : P2 (1) = a0 + a1 + a2 = 0
2 2 2 4
x = , y = 0.5 : P2 ( ) = a0 + a1 ( ) + a2 ( ) = 0.5
3 3 3 9
Solving the system gives a0 = 1, a1 = − 41 , a2 = − 34 . Thus,
P2 (x) = 1 − 14 x − 43 x 2
Problem: A Vandermonde problem is often ill-conditioned, thus
the computed solutions ai will be inaccurate. Furthermore, the
amount of computational cost, including forming the matrix,
factorisation, and triangular substitutions, is excessive.
Lagrange Interpolation Polynomials
Given (n + 1) distinct data points: (xi , yi ) for i = 0, 1, ..., n, ∃ a
unique Pn , deg (Pn ) ≤ n that solves equation (6) by Lagrangre’s
Formula
n
X
Pn (x) = yi Li (x) = y0 L0 + y1 L1 + ... + yn Ln (8)
i=0
where
n
Y x − xj (x − x0 )...(x − xi−1 )(x − xi+1 )...(x − xn )
Li (x) = =
xi − xj (xi − x0 )...(xi − xi−1 )(xi − xi+1 )...(xi − xn )
j=0,j6=i
(9)
Li (x) are cardinal
n functions, satisfying the properties
Li (xj ) = δij = 1, i=j
0, i6=j , i = 0, 1, ..., n
δij is the Kronecker delta function.
Locally supported in discrete sense.
Example 3.2. Write the Lagrange polynomial for the data in (same
as Example 3.1).
2
xi 0 1 3
yi 1 0 0.5
Solution :x0 = 0, x1 = 1, x2 = 32 , y0 = 1, y1 = 0, y2 = 0.5.
Compute the cardinal functions
(x − x1 )(x − x2 )
L0 (x) =
(x0 − x1 )(x0 − x2 )
(x − 2/3)(x − 1) 3 2
⇒ L0 (x) = = (x − )(x − 1)
(0 − 2/3)(0 − 1) 2 3
(x − x0 )(x − x2 ) (x − 0)(x − 2/3) 2

L1 (x) = = = 3x(x − )
(x1 − x0 )(x1 − x2 ) (1 − 0)(1 − 2/3) 3
(x − x0 )(x − x1 ) (x − 0)(x − 1) 9
L2 (x) = = = − x(x − 1)
(x2 − x0 )(x2 − x1 ) (2/3 − 0)(2/3 − 1) 2
So
P2 (x) = 2i=0 yi Li (x) = 32 (x − 23 )(x − 1) − 49 x(x − 1) + 0
P
= 1 − 41 x − 43 x 2 same as in Example 3.1.
H/W: Given the following 4 data points
xi 0 1 3 5
yi 1 2 6 7
find a polynomial in Lagrange form to interpolate these data.
Pros and Cons of Lagrange polynomial

I (+) It can be written down at once, since the coefficients are
the given yi ;
I (-) If more data points are added to the interpolation problem,
all Li (x) have to be recalculated;
I (-) Slow to compute, each Li (x) is different.
Oliver Mhlanga

June 23, 2021
Numerical Differentiation
If f (x) can be computed with only n digits of precision, is it also
possible to calculate f 0 (x) numerically with n digits of precision?
This challenge can be traced to the subtraction between quantities
that are nearly equal. In this section, several alternatives are
offered for the numerical computation of f 0 (x) and f 00 (x).
Recall that
f (x + h) − f (x)
f 0 (x) = lim
h→0 (10)
h
Taylor theorem:
∞ n
X 1 (k) X 1 (k)
f (x + h) = f (x)hk = f (x)hk + En+1 ,
k! k!
k=0 k=0
(11)
∞
X 1 (k) 1
En+1 = f (x)hk = f (n+1) ()hn+1 + O(hn+1 )
k! (n + 1)!
k=n+1
E = O(hn ) ⇒ ∃ real, finite, C : |E | ≤ C |h|n . E is of order hn

implying that E is approaching zero at a rate similar to hn .
f (x0 + h) − f (x0 ) h 00
f 0 (x0 ) = − f ().
h 2
For small values of h,
f (x0 + h) − f (x0 )
f 0 (x0 ) ≈
h
M|h|
with an error bounded by 2 , where M is a bound on |f 00 (x)| for
x ∈ (x0 , x0 + h).
Forward Difference (FD):
f (x + h) = f (x) + hf 0 (x) + O(h2 )

⇒ hf 0 (x) = f (x + h) − f (x) + O(h2 )
f (x + h) − f (x)
⇒ f 0 (x) = + O(h)
h
Backward Difference (BD):
f (x − h) = f (x) − hf 0 (x) + O(h2 )

⇒ hf 0 (x) = f (x) − f (x − h) + O(h2 )
f (x) − f (x − h)
⇒ f 0 (x) = + O(h)
h
Central Difference (CD):
h2 00 h3 000 h4 iv
f (x + h) = f (x) + hf 0 (x) + f (x) + f (x) + f (x) + ...
2! 3! 4!
h2 h3 000 h4 iv
f (x − h) = f (x) − hf 0 (x) + f 00 (x) − f (x) + f (x) + ...
2! 3! 4!
h3
⇒ f (x + h) − f (x − h) = 2hf 0 (x) + 2 f 000 (x) + ...
3!
0 f (x + h) − f (x − h)
⇒ f (x) = + O(h2 )
2h
FD, BD and CD formulas each involves 2 function calls, 1
subtraction , and 1 division: same computation time.
FD and BD formulas are comparable in accuracy.
CD formula is the most accurate and most recommended method,
however, sometimes CD can not be applied.
Higher Order Difference Formulas
h2 00 h3 000 h4
f (x + h) = f (x) + hf 0 (x) + f (x) + f (x) + f iv (x) + ...
2! 3! 4!
h2 h3 000 h4 iv
f (x − h) = f (x) − hf 0 (x) + f 00 (x) − f (x) + f (x) + ...
2! 3! 4!
h2 00 h4 iv
⇒ f (x + h) + f (x − h) = 2f (x) + 2 f (x) + 2 f (x) + ...
2! 4!
f (x + h) − 2f (x) + f (x − h)
⇒ f 00 (x) = + O(h2 )
h2 | {z }
f iv ()h2
Error =− 12
Other Higher Order Formulas
f (x + h) − 2f (x) + f (x − h)
f 00 (x) =
h2
f (x + 2h) − 2f (x + h) + 2f (x − h) − f (x − 2h)
f 000 (x) =
2h3
f (x + 2h) − 4f (x + h) + 6f (x) − 4f (x − h) + f (x − 2h)
f iv (x) =
h4
00 000
Other formulas for f (x), f (x), ... are also possible.
Example 4.1: f (x) = x 3 , f 0 (x) = 3x 2 , compute f 0 (1) using FD,

BD, CD.
Solution:
FD: h = 0.1, f 0 (1) = f (1.1)−f
0.1
(1.0)
= 3.31
0 f (1.0)−f (0.9)
BD: h = 0.1, f (1) = 0.1 = 2.71
0 f (1.1)−f (0.9)
CD: h = 0.1, f (1) = 0.2 = 3.01
Example 4.2: The distance x of a runner from a fixed point is

measured (in metres) at intervals of half a second. The data
obtained is
t 0.0 0.5 1.0 1.5 2.0
x 0.00 3.65 6.80 9.90 12.15
1. Use central differences to approximate the runners velocity at
times t = 0.5s and t = 1.25s
2. Use a central difference to approximate the runners
acceleration at time t = 1.5s H/W.
Solution (4.2.1):
Our aim here is to approximate x 0 (t). The choice of h is dictated
by the available data. Using data with t = 0.5s at its centre we
obtain
x(1.0) − x(0.0)
x 0 (0.5) ≈ = 6.80m/s (12)
2 × 0.5
Data centred at t = 1.25s gives us the approximation
x(1.5) − x(1.0)
x 0 (1.25) ≈ = 6.20m/s (13)
2 × 0.25
Note the value of h used.
Oliver Mhlanga

June 23, 2021
Numerical Integration
Problem: Given a function f (x), defined on an interval [a, b], we

want to find an approximation to the definite integral
Z b
I (f ) = f (x)dx (14)
a
The indefinite integral of a function is another function or a class

of functions, whereas the definite integral of a function over a
fixed interval is a number.
Main idea:
I partition [a,b] into smaller sub-intervals;
I In each sub-interval, find a polynomial pi (x) ≈ f (x);
I Integrate pi (x) on each sub-interval, sum them up.
Z b n−1
1 1 X
f (x)dx ≈ h[ f (x0 ) + f (xn ) + f (xi )] (15)
a 2 2
i=1
Equation (15) is the Trapezoid rule.

Z 1
2
Example 5.1: Evaluate I (f ) = e −x dx by Trapezoid rule with
0
n = 10.
Answer: h = 1−0 10 = 0.1
2
i xi xi2 f (xi ) = e −xi
0 0.0 0.0 1.000000
1 0.1 0.01 0.990050
2 0.2 0.04 0.960789
3 0.3 0.09 0.913931
4 0.4 0.16 0.852144
5 0.5 0.25 0.778801
6 0.6 0.36 0.697676
7 0.7 0.49 0.612262
8 0.8 0.64 0.527292
9 0.9 0.81 0.444858
10 1.0 1.00 0.367879
I (f ) ≈ T (f ; h) = 0.1[0.5 × (1 + 0.367879) + 0.990050 +
0.960789 + ... + 0.444858] = 0.746211
Error Analysis: Trapezoid Rule.

Theorem
THEOREM ON PRECISION OF TRAPEZOID RULE
If f 00 exists and is continuous on the interval [a, b] and if the
Trapezoid rule T (f ; h) with uniform spacing h is used to estimate
Rb
the integral I (f ) = a f (x)dx, then for some ξ ∈ (a, b),
1
ET (f ; h) = I (f ) − T (f ; h) = − (b − a)h2 f 00 (ξ) = O(h2 )
12
Example
Z 2 5.2. If the trapezoid rule is to be used to compute
e x dx
0
with an error of at most 0.5 × 10−4 , how many points should be
used?
Solution:
f (x) = e x , f 00 (x) = e x , a = 0, b = 2, max 00 2
|{z} |f (x)| = e .
x∈(a,b)
By error bound, it is sufficient to require that
1
|ET (f ; h)| ≤ h2 .e 2 ≤ 0.5 × 10−4
6
⇒ h2 ≤ 0.5 × 10−4 × e −2 ≈ 4.06 × 10−5
2 p
⇒ = h ≤ 4.06 × 10−5 = 0.0064
n
2
⇒ n ≥ 0.0064 ≈ 313.8
We’ve at least 314 + 1 = 315 points will certainly produce the
desired accuracy.
Simpson’s rule.
A numerical integration rule over two equal sub-intervals with
partition points a, a + h, & a + 2h = b is the widely used basic
Simpsons Rule:
Z a+2h
h
f (x)dx ≈ [f (a) + 4f (a + h) + f (a + 2h)] (16)
a 3
Simpsons Rule computes exactly the integral of an interpolating
quadratic polynomial over an interval of length 2h using three
points; namely, the two endpoints and the middle
Ra point.
To derive Simpsons Rule for approximating a f (x)dx, we divide
the interval into an even number of sub-intervals of width
b−a
h= , x0 = a, x2m = b, xi+1 − xi = h, 2m = n.
2m
It can be derived by integrating over the interval [xi , xi+2 ] with the
Lagrange quadratic polynomial pi :
Thus,
Rb we obtain
f (x)dx ≈
a 
n/2 (n−2)/2
h  X X 
[f (a) + f (b)] + 4 f [a + (2i − 1)h] + 2 f (a + 2ih)
3 
i=1 i=1
| {z }
Simpson0 s rule=S(f ;h)
Z 1
2
Example 5.3. Evaluate I (f ) = e −x dx by Sympson’s rule with
0
2m = 10.
Answer: h = 1−0
10 = 0.1
2
i xi xi2 f (xi ) = e −xi
0 0.0 0.0 1.000000
1 0.1 0.01 0.990050
2 0.2 0.04 0.960789
3 0.3 0.09 0.913931
4 0.4 0.16 0.852144
5 0.5 0.25 0.778801
6 0.6 0.36 0.697676
7 0.7 0.49 0.612262
8 0.8 0.64 0.527292
9 0.9 0.81 0.444858
10 1.0 1.00 0.367879
Error Estimate for Simpson’s rule.

1
The error term is ES (f ; h) = − (b − a)h4 f iv (ξ). Proof: H/W.
180
This gives error bound
1
|ES (f ; h)| ≤ (b − a)h4 max iv th
|{z} |f (x)| ⇒ 4 order .
180
x∈(a,b)
Example 5.4. With f (x) = e x defined on [0, 2], use Simpson’s rule
Z 2
to compute f (x)dx. In order to achieve an error≤ 0.5 × 10−4 ,
0
how many points must we take?
Answer:
2 4 2
|ES (f ; h)| ≤ h e ≤ 0.5 × 10−4
180
180 −2
⇒ h4 ≤ e ≤ 0.5 × 10−4
2
⇒ h ≤ 0.18682
b−a
⇒m= = 5.35 ≈ 6
2h
We need at least 2m + 1 = 13 points (more powerful!).
Oliver Mhlanga

June 23, 2021
Systems of Linear Equations

A system of n linear equations in n unknowns (variables) is written

as
a11 x1 + a12 x2 + ... + a1n xn = b1

................................... ....
ai1 x1 + ai2 x2 + ... + ain xn = bi (17)
.................................. ....
an1 x1 + an2 x2 + ... + ann xn = bn
Where x1 , x2 , ..., xn are the unknowns (variables) of the system and

a11 , a12 , ..., ann are the coefficients of the unknowns of the system.
The numbers b1 , b2 , ..., bn are constants or free terms of the
system.
The system of equations (17) can be written as

n
X
aij xj = bi , i = 1, 2, ..., n (18)
j=1
Also, (17) can be written in matrix form as
AX = b (19)
where A=
 
a11 a12 . . . a1n  
x1
 
b1
a21 a22 . . . a2n   x2 
  b2 
..  , X =  ,b = 
 
 .. .. .. 
 . . . .  ...  . . .
an1 an2 . . . ann xn bn
I The system of linear equation (17) is consistent if it has a
solution. If a system of linear equations has no solution, then
it is inconsistent (or incompatible).
I A consistent system of linear equations may have one solution
or several solutions and is said to be determinate if there is
one solution and indeterminate if there are more than one
solution.
Generally, the following three types of the elementary
transformations to a system of linear equations are used.
1. Interchange: The order of two equations can be changed.
2. Scaling: Multiplication of both sides of an equation of the
system by any non-zero number.
3. Replacement: Addition to (subtraction from) both sides of
one equation of the corresponding sides of another equation
multiplied by any number.
A system in which the constant terms b1 , b2 , ..., bn are zero is

called a homogeneous system. Two basic techniques are used to
solve a system of linear equations:
1. direct method: which are techniques that give a solution in a
finite number of steps, subject to only round-off errors are
used in this chapter. Several direct methods are used to solve
a system of equations, among them following are most useful.
(i) Cramers rule, (ii) matrix inversion, (iii) Gauss
elimination, (iv) decomposition, etc.
2. iteration method: The most widely used iteration methods are
(i) Jacobi’s iteration, (ii) Gauss-Seidal’s iteration, etc.
System of equation (19) has a unique solution X = A−1 b where
the coefficient matrix A is non-singular (or regular/ invertible).
Gaussian Elimination Method:
I is most widely used to solve the system of equations (19).
The crucial step is to construct an equivalent upper-triangular
form which is easier to solve. This process, in turn, is
equivalent to finding the factorization A = LU, where L is a
unit lower triangular matrix and U is an upper triangular
matrix. This factorization is especially useful when solving
many linear systems involving the same coefficient matrix but
different right-hand sides, which occurs in various applications.
I If the coefficient matrix A are symmetric, positive definite,
triangular, banded, block, or sparse, the general approach of
Gaussian elimination with partial pivoting needs to be
modified or rewritten specifically for the system. If the
coefficient matrix has predominantly zero entries, the system
is sparse and iterative methods can involve much less
computer memory than Gaussian elimination.
Gaussian elimination consists of (a) forward eliminations and (b)

backward substitution. That is system of equations (19) can be
rewritten as Ux = b of the form
 
u11 u12 . . . u1n  x1   b1 
 0 u22 . . . u2n     
 x2   b2 
=

 .. .. .. ..  
 . . . .  . . . . . .
0 0 . . . unn xn bn
is called back substitution. The solution xi are computed in
reversed order by
xn = bn /unn ,
xn−1 = (bn−1 − un−1,n xn )/un−1,n−1 ,
...
x1 = (b1 − u12 x2 − u13 x3 − · · · − u1n xn )/u11
(20)
provided allPuii 6= 0. The general formulation is

xi = (bi − nj=i+1 uij xj )/uii , i = n, n − 1, ..., 1.
Example 6.1. Solve the following system of equations by Gauss
elimination:
−0.04x1 + 0.04x2 + 0.12x3 = 3 (21)

0.56x1 − 1.56x2 + 0.32x3 = 1 (22)
−0.24x1 + 1.24x2 − 0.28x3 = 0 (23)
Solution: in matrix form

Augmented matrix
 
−0.04 0.04 0.12 3
 0.56 −1.56 0.32 1 
−0.24 1.24 −0.28 0
0.56
Forward elimination: R1 × 0.04 + R2 7−→ R2 and
0.24
R3 − R2 × 0.04 7−→ R3 yields:
 
−0.04 0.04 0.12 3
 0 −1 2 43 
0 1 −1 −18
R2 + R3 7−→ R3 yields
 
−0.04 0.04 0.12 3
 0 −1 2 43 
0 0 1 25
Backward substitution of Gauss elimination is
25
x3 =
1
[43 − (2)(25)]
x2 = =7
−1
[3 − (0.12)(25) − (0.04)(7)]
x1 = =7
(−0.04)
Scaled Partial Pivoting:

I The diagonal elements of U are called pivots. The k th pivot is
the coefficient of the k th variable in the k th equation at the
k th step of the elimination. Both the computation of the
multipliers and the back substitution require divisions by the
pivots. Consequently, the algorithm cannot be carried out if
any of the pivots are zero.
I Pivoting is used to change sequential order of the equations
for two purposes:
1. to prevent diagonal coefficients from becoming zero,
2. to make each diagonal coefficient larger in magnitude than any
other coefficients below it. changes of order increases accuarcy
of the computations since round-off error is reduced.
Example 6.2a.
Consider the system of equations
2x + 16y + z = 39,
x + y + 25z = 83,
10x + y + z = 19.
Solve the system correct to 4 decimal places by Gaussian

elimination with scaled partial pivoting.
Solution: Rearranging the system to make it diagonally dominant
10x + y + z = 19.
2x + 16y + z = 39,
x + y + 25z = 83.
Augmented matrix
 
10 1 1 19
 2 16 1 39 
1 1 25 83
2 1
Forward elimination: R2 :7−→ R2 − 10 R1 and R3 :7−→ R3 − 10 R1
 
10 1 1 19
 0 158 8 352 
10 10 10
0 10 10 811
9 249
10
R2 :7−→ 10R2 , R3 :7−→ 10R3
 
10 1 1 19
 0 158 8 352 
0 9 249 811
9
R3 :7−→ R3 − 158 R2
 
10 1 1 19
 0 158 8 352 
39270 124970
0 0 158 158
Backward substitution:
39270 124970
z= ⇒ z = 3.1823
158 158
352 − 8(3.1823)
158y + 8z = 352 ⇒ y = = 2.0667
158
19 − 2.0667 − 3.1823
10x + y + z = 19 ⇒ x = = 1.3751
10
The quantity Cond(A), is called the condition number of matrix A,

defined by
Cond(A) = ||A||||A−1 ||
where ||A|| is any matrix norm, gives a measure of the condition of
the matrix A. The large value of Cond(A) indicates the
ill-condition of a matrix or the associated system of equations.
A system is called ill-conditioned or ill-posed if a small change in
the coefficients of the system produces large change in the solution
(unstable),
however, if the change in the solution is small for small changes in
the coefficients, then the system is called well-conditioned or
well-posed system.
Iteration Methods.
I If the system of equations has a large number of variables,
then the direct methods are not much suitable. In this case,
the approximate numerical methods are used to determine the
variables of the system.
I The efficiency of the application of approximate methods
depends on the choice of the initial vector and the rate of
convergence of the process.
The following two approximate methods are widely used to solve a
system of linear equations:
1. method of iteration (Jacobis iteration method), and
2. Gauss-Seidals iteration method.
Jacobi’s Iteration Method.
Consider the linear system (17), also assume that the quantities aii
are pivot elements. System (17) can be written as
1
x1 = (b1 − a12 x2 − a13 x3 − · · · − a1n xn )
a11
1
x2 = (b2 − a21 x1 − a23 x3 − · · · − a2n xn )
a22
..
.
1
xn = (bn − an1 x1 − an2 x2 − · · · − ann−1 xn−1 ).
ann
(k) (k) (k)

In general, if x1 , x2 , ..., xn be the k th approximate roots then
the next approximate roots are given by
1 (k) (k) (k)

x1k+1 = (b1 − a12 x2 − a13 x3 − · · · − a1n xn )
a11
k+1 1 (k) (k) (k)
x2 = (b2 − a21 x1 − a23 x3 − · · · − a2n xn )
a22
...
1 (k) (k) (k)
xnk+1 = (bn − an1 x1 − an2 x2 − · · · − ann−1 xn−1 ).
ann
k = 0, 1, 2...
The iteration process is continued until all the roots converge to
the required number of significant figures.
The Jacobi’s iteration method and Gauss-Seidel method converge
if the coefficient
Pn matrix is diagonally dominant i.e.,
(|aii | ≥ j=1,j6=i aij )
(k+1) (k)
The method is repeated until |xi − xi | < ∀i = 1, 2, ..., n,
where > 0 is any pre-assigned number called the error tolerance.
Example 6.4. Solve the following system of linear equations by

Gauss-Jacobi’s method correct up to four decimal places.
27x + 6y − z = 54, 6x + 15y + 2z = 72, x + y + 54z = 110.
Solution: Obviously, the system is diagonally dominant as
|6| + | − 1| < |27|, |6| + |2| < |15|, |1| + |1| < |54|.
The Gauss-Jacobis iteration scheme is
1
x k+1 = (54 − 6y k + z k )
27
1
y k+1 = (27 − 6x k − 2z k )
15
1
z k+1 = (110 − x k − y k )
54
Let the initial solution be (0, 0, 0). The next iterations are shown
in the following table.
k x y z
0 0 0 0
1 2.00000 4.80000 2.03704
2 1.00878 3.72839 1.91111
3 1.24225 4.14167 1.94931
4 1.15183 4.04319 1.93733
5 1.17327 4.08096 1.94083
6 1.16500 4.07191 1.93974
7 1.16697 4.07537 1.94006
8 1.16614 4.07454 1.93996
9 1.16640 4.07488 1.93999
10 1.16632 4.07477 1.93998
11 1.16635 4.07481 1.93998
The solution correct up to four decimal places is
x = 1.1664, y = 4.0748, z = 1.9400.
Gauss-Seidal’s Iteration Method.

A simple modification of Jacobis iteration sometimes give faster
convergence. The modified method is known as Gauss-Seidal’s
iteration method.
(k)
Consider again system (17), If xi , i = 1, 2, ..., n be the k th
approximate value of xi , then the (k + 1)th approximate value of
x1 , x2 , ..., xn are given by
1 (k) (k) (k)

x1k+1 = (b1 − a12 x2 − a13 x3 − · · · − a1n xn )
a11
1 (k+1) (k) (k)
x2k+1 = (b2 − a21 x1 − a23 x3 − · · · − a2n xn )
a22
..
.
xik+1 =
1 (k+1) (k+1) (k) (k)
(bi − ai1 x1 − · · · − aii−1 xi−1 − aii+1 xi+1 − · · · − ain−1 xn−1 )
aii
..
.
1 (k+1) (k+1) (k+1)
xnk+1 = (bn − an1 x1 − an2 x2 − · · · − ann−1 xn−1 ).
ann
k = 0, 1, 2...
(k+1) (k)
The method is repeated until |xi − xi | < ∀i = 1, 2, ..., n,
where > 0 is any pre-assigned number called the error tolerance.
Example 6.5. Solve the following system of equations by
Gauss-Seidals iteration method, correct up to four decimal places.
27x + 6y − z = 54, 6x + 15y + 2z = 72, x + y + 54z = 110.
Solution. The iteration scheme is
1
x k+1 = (54 − 6y k + z k )
27
1
y k+1 = (27 − 6x k+1 − 2z k )
15
1
z k+1 = (110 − x k+1 − y k+1 )
54
Let y = 0, z = 0 be the initial solution. The successive iterations
are shown below
k x y z
0 - 0 0
1 2.00000 4.00000 1.92593
2 1.18244 4.07023 1.93977
3 1.16735 4.07442 1.93997
4 1.16642 4.07477 1.93998
5 1.16635 4.07480 1.93998
6 1.16634 4.07480 .93998
The solution correct up to four decimal places is
x = 1.1663, y = 4.0748, z = 1.9400
Remark: Usually, the Gauss-Seidal’s method converges rapidly
than the Gauss- Jacobis method. But, this is not always true.
There are some examples in which the Gauss-Jacobi,s method
converges faster than the Gauss-Seidal’s method.
References
[1]. S.T Chapra and R. P. Candle: (2015) Numerical Methods For
Engineers, Seventh Edition.
[2] N. R. Nassif and D. K. Fayyad: (2014) Introduction to
Numerical Analysis and Scientific Computing.
[3]. Richard L. Burden, J. Douglas Faires. Numerical Analysis
Ninth Edition, Youngstown State University 2011 Brooks Cole,
Cengage Learning, www.cengage.com.
[4]. K.Atkinson ,W Han :(2004) Elementary Numerical Analysis
Wiley India.
[5]. Madhumangal Pal. Numerical Analysis for Scientists and
Engineers: Theory and C Programs
[6]. E.Kreyszig: Advanced Engineering mathematics. Copyright
(2006) John Wiley & Sons, Inc.
[7]. C. Trenchea: Numerical Mathematical Analysis (2010). Dpt of
Mathematics , University of Pittsburgh.
[8]. P. J. Olver: AIMS lecture notes on Numerical Analysis (1995).

Numerical Analysis 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numerical Analysis 1

Uploaded by

Copyright:

Available Formats

ICS226/ISE225: Numerical Analysis

HARARE INSTITUTE OF TECHNOLOGY

June 23, 2021

Error and Computer Arithmetic

Numerical Analysis: This refers to the analysis of mathematical

Computers use binary arithmetic, representing each number as a

Note that if f has an infinite representation in base 10, its

Hexadecimal representation: To represent a binary number in

In any applied numerical computation, there are three key sources

The percentage error (a.k.a relative percentage error)of an

H/W 1. Determine the absolute error and the exact number

HARARE INSTITUTE OF TECHNOLOGY

June 23, 2021

Solution of Nonlinear Equations

Our goal is to develop simple numerical methods for the

where f is a real-valued function, defined and continuous on a

These methods compute a solution of a nonlinear equation starting

The bisection method is a procedure that repeatedly ’halves’ the

accurate to within  = 0.001.

We choose a = 1, b = 2; then f (a) = −1, f (b) = 61, and

Table 1: Bisection Method for (2)

The number (n) of bisections needed to bracket a root given

For the previous example (2),we need

Disadvantage of the Bisection Method

We are now going to establish general methods for computing a

(4) is referred to as the Newton’s method, or Newton-Raphson,

It is guaranteed to converge if the initial guess x0 is close enough,

Newton’s Method for x 6 − x − 1 = 0

HARARE INSTITUTE OF TECHNOLOGY

June 23, 2021

Interpolation and Approximation

Let y = f (x), then given (n + 1) distinct data points

Equation (6) is called interpolating polynomial if

Pn (x) = a0 + a1 x + a2 x 2 + ...... + an x n . (7)

(7) has (n + 1) distinct functions, xi s are called nodes.

The aim is to solve (7), and determine the coefficients

Pn (x0 ) = a0 + a1 x0 + a2 x02 + ...... + an x0n = y0

x02 ... x0n

(x − x0 )(x − x2 ) (x − 0)(x − 2/3) 2

Pros and Cons of Lagrange polynomial

HARARE INSTITUTE OF TECHNOLOGY

June 23, 2021

E = O(hn ) ⇒ ∃ real, finite, C : |E | ≤ C |h|n . E is of order hn

Forward Difference (FD):

f (x + h) = f (x) + hf 0 (x) + O(h2 )

f (x − h) = f (x) − hf 0 (x) + O(h2 )

Central Difference (CD):

Higher Order Difference Formulas

Other Higher Order Formulas

Example 4.1: f (x) = x 3 , f 0 (x) = 3x 2 , compute f 0 (1) using FD,

Example 4.2: The distance x of a runner from a fixed point is

HARARE INSTITUTE OF TECHNOLOGY

June 23, 2021

Problem: Given a function f (x), defined on an interval [a, b], we

The indefinite integral of a function is another function or a class

Equation (15) is the Trapezoid rule.

Error Analysis: Trapezoid Rule.

Error Estimate for Simpson’s rule.

HARARE INSTITUTE OF TECHNOLOGY

June 23, 2021

Systems of Linear Equations

A system of n linear equations in n unknowns (variables) is written

a11 x1 + a12 x2 + ... + a1n xn = b1

Where x1 , x2 , ..., xn are the unknowns (variables) of the system and

accurate to within = 0.001.