Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

EGM 6341

Engineering Analyses
of Numerical Methods
Lecture 1

Syllabus
Introduction
1.1 Why Numerical Method?

Example 1

Consider steady state heat conduction in a square D T/y=0 C


 2
T  2
T
Governing equation:  T= 2
2 + =0
x y2
T=T1<T2
How do we determine the heat T/x=0
flow from wall AB to wall AD?

Possible solutions:

1. Experiment A B
T=T2
2. Analytical solution -- possible for this geometry
3. Numerical solution -- solving elliptic PDE
1.1 Why Numerical Method?
y
D
Represent the field by a series of nodes. C j=Ny

 2T  2T
Use central differences for the second order derivatives + =0
x 2 y2


Ti,j
 A system of linear algebraic equations for Ti,j j


 Solve the system of equations to get Ti,j
Dy … …
Plot contour, compute flux,… Dx

x
j=1
A i B
i=1 i=Nx
Ti,j+1

Ti-1,j Ti,j Ti+1,j

Ti,j-1
1.1 Why Numerical Method?
Example 2

Consider steady state heat conduction in a square with a cut D T/y=0 C

Governing equation: 2T= 0

T(x, y) =? T=T1<T2

T/x=0
Possible solutions:

1. Experiment
A B
2. Analytical solution? -- not possible for this geometry T=T2

3. Numerical solution -- same procedure as for the full square


1.1 Why Numerical Method?

Example 3 D T/y=0 C

Unsteady state heat conduction in a non-simple geometry:

T
t
= a 2 T T=T1<T2

T/x=0
2. Analytical solution -- not possible for this geometry
3. Numerical solution -- solving parabolic PDE
A B
T=T2
Also need to discretize in time
Cautions:
Tn • NO numerical method is completely trouble free in all situations.
• NO numerical method is completely error free.
• NO numerical method is optimal for all situations.
• Be careful about:
t ACCURACY, EFFICIENCY, & STABILITY.
t1 t2 tn
1.1 Why Numerical Method?

Solve a simple ODE: dy


Example 4 = -10y, y(0) = 1
dt
First note: exact solution is yexact(t) = e-10t.
dy y n +1 − y n
Approximating: LHS  as yn
dt Dt
& treating: RHS  -10y as -10 yn (n=0, 1, 2, 3,…)
t
y n +1 − y n t1 t2 tn
 Dt
= -10yn

 yn+1 = yn -10Dt yn = yn (1 -10Dt),


with y0 =1.
yn+1 = yn (1-10Dt) = yn-1 (1-10Dt) 2
= yn-2 (1-10Dt)3 = ... = y0(1-10Dt)n+1
Choose Dt = 0.05, 0.1, 0.2, and 0.5,
=> 10Dt =0.5, 1, 2, and 5;
1-10Dt =0.5, 0, -1, and -4
See what happens!
1.1 Why Numerical Method?
dy
Example 4 Solve a simple ODE: = -10y, y(0) = 1
dt
yn+1 = y0(1-10Dt)n+1
1.1
10Dt =0.5 10Dt =1 10Dt =2 10Dt =5 1 y y_numerical
0.9 Dt=0.05
n yn(Dt=0.05) yn(Dt=0.1) yn(Dt=0.2) yn(Dt=0.5) 0.8 y_exact
0 1 1 1 1 0.7
1 0.5 0 -1 -4 0.6
2 0.25 0 1 16 0.5

3 0.125 0 -1 0.4
-640
0.3
4 0.0625 0 1 256
0.2
5 0.03125 0 -1 -1024 0.1
6 0.0156250 0 1 2048 0
7 0.0078125 0 -1 … -0.1
0 0.5 1 1.5 t
Comments: ok inaccurate oscillate blow up

Questions: Why the solution blows up for Dt=0.5?


How to detect/prevent numerical instability (blowing up) in general?
How to improve accuracy (c.f. the case with Dt=0.05)?
How to get solution efficiently if a large system is solved?
1.2 Mathematical Preliminary

1.2.1 Intermediate Value Theorem

Let f(x) be continuous function on the finite interval a≤x≤b,


y
M “maximum” M

“minimum” m m

a x b x
x x

define m = Infimum f(x) M = Supremum f(x)


a≤x≤b a≤x≤b

Then for any number z in the interval [m, M], there is at least one point x in [a, b] for which
f(x) = z.
In particular, there are points and in [a, b] for which
m = f( x ) and M = f(𝑥).
ҧ
EGM 6341
Engineering Analyses
of Numerical Methods
Lecture 2

Math preliminary
1.2 Mathematical Preliminary

1.2.2 Mean Value Theorem

Let f(x) be continuous function on the finite interval a≤x≤b, y


and let it be differentiable for a≤x≤b. f(b) Slope:
f (x )
Then there is at least one point x in [a, b] for which
f(b) - f(a) = f ′(x)(b-a). (1) f(b) - f(a)

f(a)

1.2.3 Integral Mean Value Theorem (IMVT) a x b x

Let w(x) be nonnegative (w≥0) and integrable on [a, b] and Graphical interpretation of the theorem
let f(x) be continuous on [a, b].
Then
b

b

a
w(x) f(x) dx = f(x) 
a
w(x) dx (2)

for some x in [a, b].


1.2 Mathematical Preliminary

1.2.4 Taylor series expansion

Let f(x) have n+1 continuous derivatives on [a, b] for some n≥0
and let x, x0[a, b]. Then y=f(x)
f(x) = Pn(x) + R n+1(x) (3)
where
f ( x0 )
Pn ( x) = f ( x0 ) + f ( x0 )( x − x0 ) + ( x − x0 ) 2 [ ]
2!
a b x
f ( x0 ) ( x − x0 ) n ( n ) x0
+ ( x − x0 ) + ...
3
f ( x0 ) (4)
3! n!
x
1
Rn+1(x) =  ( x − t ) f(n+1)(t) dt
n
(5)
n! x
0
x
1
n ! x
(IMVT) = f (n+1)(x) ( x − t ) n
dt (x between x0 & x)
0
n +1
( x − x0 )
= (n + 1)! f (n+1)(x) (5’)
= truncation error of the expansion = T.E.
Taylor Series illustration
f ( x0 ) f ( x0 )
f ( x) = f ( x0 ) + f ( x0 )( x − x0 ) + ( x − x0 ) +
2
( x − x0 )3 + ...
2! 3!
1.4
0th order
1.2 f ( x) ~ f ( x0 )
1
True f ( x) ~ f ( x0 ) + f ( x0 )( x − x0 )
0.8

0.6 f(x)
f ( x0 )
0.4
zero f ( x) = f ( x0 ) + f ( x0 )( x − x0 ) + ( x − x0 ) 2
1st Order
2!
f(x)
0.2 2nd Order

0
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Truncation Error in Taylor Series Expansion
Taylor series expansion
( x − x0 ) 2 ( x − x0 )3
f ( x) = f ( x0 + x − x0 ) = f ( x0 ) + ( x − x0 ) f ( x0 ) + f ( x0 ) + f ( x0 ) + ....
2! 3!

Example (higher-order terms truncated):


Set x0=0,
−x x 2 x3 x 4 x5
f ( x) = e = 1 − x + − + − + ....
2! 3! 4! 5!

x3 x5 x 7 x9
f ( x) = sin x = x − + − + + ....
3! 5! 7! 9!
Examples: Truncation Error in Taylor Series Expansion
Taylor series expansion of f(xi+1) & f(xi-1) based on f(xi)
( x − x0 ) 2 ( x − x0 )3
f ( x) = f ( x0 + x − x0 ) = f ( x0 ) + ( x − x0 ) f ( x0 ) + f ( x0 ) + f ( x0 ) + ....
2! 3!
( x − a)2 ( x − a )3
or f ( x) = f (a + x − a) = f (a) + ( x − a) f ( a) + f ( a) + f ( a) + ....
2! 3!
xi-1 xi+1
x Set x0=a=xi, let x = xi+1, h= x – a = xi+1 – a =>
a=xi
h2 h3
f ( xi +1 ) = f ( xi + h) = f ( xi ) + hf ( xi ) + f ( xi ) + f ( xi ) + ....
2! 3!

Set a=xi, let x = xi-1, h= -(x-a) = a –xi-1 =>

h2 h3
f ( xi −1 ) = f ( xi − h) = f ( xi ) − hf ( xi ) + f ( xi ) − f ( xi ) + ....
2! 3!

Anything that are truncated leads to TRUNCATION Error.


Examples of Truncation Error in Taylor series expansion
x
1
Find: Remainders, R4, in the Taylor series expansion of sin(x) Rn+1(x) = 
n! x
( x − t ) n f(n+1)(t)dt
0

Soln. For f(x) = sin(x)  f ′(x) = cos(x), f ″(x) = -sin(x), f ‴(x) = -cos(x), f (4)(x) = sin(x)
 @ x0 =0, f ′(0) =1, f ″(0) = 0, f ‴(0) = -1, f (4)(0) = 0

 f(x) = sin(x) = x – x3/3! + … = P3(x) + R4(x)


0 𝑥
x x
1 1 1 −1 11
 − =  ( x − t )3 sin(t )dt Let z=x-t => න sin( 𝑥 − 𝑧)𝑑𝑧 4 =
3 (4) = න sin( 𝑥 − 𝑧)𝑑𝑧 4
R4(x) = ( x t ) f (t ) dt 3! 4 3! 4
3! 0 3! 0 𝑥 0

0.012
Integration by parts =>
𝑥 𝑥
11 4ቚ
𝑥 1 0.01
R4
R4(x) = [sin( 𝑥 − 𝑧) 𝑧 + න 𝑧 4 cos( 𝑥 − 𝑧) 𝑑𝑧] = න cos( 𝑢)(𝑥 − 𝑢)4 𝑑𝑢 sin(x)-P3(x)
3! 4 0 4!
0 0 0.008
𝑥
1 x5
(IMVT=>) = cos( 𝜉) න(𝑥 − 𝑢)4 𝑑𝑢
4!
= cos(x ) 0.006

0 5! 0.004
5
x
Hence sin(x) = x – x3/3! + 5! cos(x ) for x between 0 and x. 0.002

x
Since cos(x) ~1 for small |x| so that R4 ~x5/5! 0
0 0.2 0.4 0.6 0.8 1 1.2
1.2 Mathematical Preliminary

1.2.5 Taylor series expansion in two dimensions


Let f(x, y) be n+1 time continuously differentiable for all (x, y)
in some neighborhood of (x0, y0). Then,

n
f(x0+x, y0+h) = f(x0, y0) + 
1 m
D f ( x, y ) +
1 Dn+1f(x, y) |x0+qx ,y0+qh (8)
m =1 m! x
0 , y0
(n + 1)!

 
where D =x +h
x y

𝜕 2 𝜕 2 𝜕 2
𝐷2 = 𝜉2 2 + 2𝜉𝜂 + 𝜂2 2
𝜕𝑥 𝜕𝑥𝜕𝑦 𝜕𝑦

and 0≤q ≤1.


Example: Find Taylor series expansion for f(x, y)= ln 1 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 1Τ2 near x=y=0

𝜕𝑓 2 + 2𝑥 + 𝑦 𝜕𝑓
Sol: f(0, 0) = 0, =  (0,0) = 1;
𝜕𝑥 2[1 + 2𝑥 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 ] 𝜕𝑥

𝜕𝑓 𝑥 + 3𝑦 2 𝜕𝑓
=  (0,0) = 0
𝜕𝑦 2[1 + 2𝑥 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 ] 𝜕𝑦

𝜕 2 𝑓 2[1 + 2𝑥 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 ] − (2 + 2𝑥 + 𝑦)(2 + 2𝑥 + 𝑦) 𝜕2 𝑓
=  (0,0) = −1
𝜕𝑥 2 2 1 + 2𝑥 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 2 𝜕𝑥 2

𝜕 2 𝑓 6𝑦[1 + 2𝑥 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 ] − (𝑥 + 3𝑦 2 )(𝑥 + 3𝑦 2 ) 𝜕2𝑓


=  (0,0) = 0
𝜕𝑦 2 2 1 + 2𝑥 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 2 𝜕𝑦 2

𝜕2𝑓 1 + 2𝑥 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 − (𝑥 + 3𝑦 2 )(2 + 2𝑥 + 𝑦) 𝜕2𝑓 1


=  (0,0) =
𝜕𝑥𝜕𝑦 2 1 + 2𝑥 + 𝑥 2 + 𝑥𝑦 + 𝑦 3 2 𝜕𝑥𝜕𝑦 2

1 1
Thus, f(x, y) ~ 0 + 𝑥 + 0 • 𝑦 + (−1)𝑥 2 + 𝑥𝑦 + 0 • 𝑦 2
2 2
1 1
= 𝑥 − 2 𝑥 2 + 2 𝑥𝑦+. . .
1.2 Mathematical Preliminary

1.2.6 Fourier series & Fourier transform F: f(x)

i. Periodic functions
p=2L
note: f(x+2L) = f(x) -5L -3L p -L 0 L -3L x

f(x) is a periodic function if f(x+p) = f(x) and p is the period

ii. Fourier series expansion


f(x) = 𝑎0 + ෍ [𝑎𝑛 cos 𝑛𝑥 𝜋Τ𝐿 + 𝑏𝑛 sin 𝑛𝑥 𝜋Τ𝐿 ]


𝑛=1
1 𝐿 1 𝐿 1 𝐿
𝑎0 = ‫׬‬ 𝑓(𝑥)𝑑𝑥, 𝑎𝑘 = න 𝑓 𝑥 cos 𝑘𝑥 𝜋Τ𝐿 𝑑𝑥, 𝑏𝑘 = න 𝑓(𝑥)sin(𝑘𝑥 𝜋Τ𝐿)𝑑𝑥
2𝐿 −𝐿 𝐿 −𝐿 𝐿 −𝐿
iii. Fourier transform F:

∞ | fˆ ( w) |2
1

𝑓(𝑤) = න 𝑓(𝑥)𝑒 −𝑖𝑤𝑥 𝑑𝑥 for f()=0, න |𝑓(𝑥)|𝑑𝑥 ≠ ∞
2𝜋 −∞
| fˆ ( w) |2 dw
−∞ ∞
1
Fourier inverse transform: 𝑓(𝑥) = න መ
𝑓(𝑤)𝑒 𝑖𝑤𝑥 𝑑𝑤.
2𝜋
−∞ dw

|𝑓(𝑤)| 2 =spectral density or energy spectrum w
Some Properties of Fourier transform

a) Linearity of the Fourier transform



1
𝐹{𝑎𝑓(𝑥) + 𝑏𝑔(𝑥)} = න [𝑎𝑓(𝑥) + 𝑏𝑔(𝑥)]𝑒 −𝑖𝑤𝑥 𝑑𝑥
2𝜋
−∞ ∞ ∞
1 1
=𝑎 න 𝑓(𝑥)𝑒 −𝑖𝑤𝑥 𝑑𝑥 + 𝑏 න 𝑔(𝑥)𝑒 −𝑖𝑤𝑥 𝑑𝑥 = 𝑎𝐹{𝑓(𝑥)} + 𝑏𝐹{𝑔 𝑥 }
2𝜋 2𝜋
−∞ −∞

b) Fourier transform of derivatives of f(x):


∞ ∞
1 1
𝐹{𝑓′(𝑥)} = න 𝑓′(𝑥)𝑒 −𝑖𝑤𝑥 𝑑𝑥 = {𝑓 𝑥 𝑒 𝑖𝑤𝑥 |+∞
−∞ − 𝑖𝑤 න 𝑓 𝑥 𝑒 −𝑖𝑤𝑥 𝑑𝑥}= −𝑖𝑤𝐹{𝑓(𝑥)}
2𝜋 2𝜋
−∞ −∞
𝐹{𝑓′(𝑥)} = −𝑖𝑤𝐹{𝑓(𝑥)} Similarly, 𝐹{𝑓(𝑥)} = −𝑖𝑤 2𝐹{𝑓(𝑥)}= -𝑤2𝐹{𝑓(𝑥)}

Application: solve ay +by′ + cy = g(t) using Fourier transform


𝐹{𝑎𝑦" + 𝑏𝑦′ + 𝑐𝑦 = 𝑔(𝑡)} ⇒ −𝑤 2 𝑎𝑦ො 𝑤 − 𝑖𝑤𝑏𝑦(𝑤)ො + 𝑐 𝑦(𝑤)
ො ො )
= 𝑔(𝑤
𝑦(𝑤)
ො ො )Τ[ 𝑐 − 𝑖𝑏𝑤 − 𝑎𝑤 2 ]
= 𝑔(𝑤 ⇒ 𝑦(𝑡) = 𝐹 −1 {𝑦(𝑤)
ො }
𝜕𝑢 𝜕𝑢 𝜕2𝑢 𝜕3𝑢
We will use Fourier transform to study the solution behavior of : +𝑐 =𝛼 2+𝐷 3
𝜕𝑡 𝜕𝑥 𝜕𝑥 𝜕𝑥
to understand the roles of advection (c), diffusion (a), and dispersion (D).
Lecture 3

Errors

Floating Point Arithmetic


1.3 Sources of Errors in Computations:

1.3.1 Absolute and relative errors:

True value (T.V.) xT = Approximate value (A.V.) xA + Error e;

i.e. Absolute error = T.V. - A.V.; e = xT - xA (9)


T.V. − A.V. 𝜀
Relative error: Rel.( xA) = =𝑥 (10)
T.V. 𝑇

1.3.2 Types of errors


 Modeling error -- e.g neglecting friction in computing a bullet trajectory

 Empirical measurements -- g (gravitational acceleration), h (Planck constant), ...

 Blunders

 Input data inexact-- weather prediction based on data collected


 Round-off error -- e.g. p  3.1415927 instead of 3.1415926535897932384...
𝑥2 𝑥3 𝑥4
 Truncation error -- e.g. ex  1 + x + + +
2 3! 4! for small x,
or dy/dt  𝑦𝑛+1 − 𝑦𝑛
𝛥𝑡 for small Dt.
1.3 Sources of Errors in Computations:

Example: what causes errors in the surface area of the earth: A = 4p r2

Errors in various approximations:


† Earth is modeled as a sphere (an idealization)
† Earth radius r  6378.14 km from measurements.
† p 3.14159265
† Calculator or computer has finite length; result is rounded.

Example: Truncation Error in Taylor series:


f ( x0 ) f ( x0 ) ( x − x0 ) n ( n )
f ( x) = f ( x0 ) + f ( x0 )( x − x0 ) + ( x − x0 ) +
2
( x − x0 ) + ...
3
f ( x0 ) + R n+1(x)
2! 3! n!
Rn+1(x) = Remainder or Truncation Error (ET)

It is not included in the approximation; but we need to understand Rn+1(x) in order to understand TE.

What about roundoff error? first look into floating point arithmetic
1.4 Floating Point Arithmetic
1.4.1 Anatomy of a floating-point number
• A floating point number x is represented in scientific notation as “ ”

x =  .d1d 2 d3 d p Be ; x=±
𝑑1
𝐵
𝑑 𝑑 𝑑p
+ 𝐵22 + 𝐵33 + ⋯ + 𝐵𝑝 𝐵𝑒

 = sign
B = number base: 2 or 10 or 16; it is fixed for a given computing device
d1d2 ... dp = mantissa or fractional part of significand; d1 ≠ 0; 0≤di ≤B-1, i=1, 2...p

p = number of significant bits, e.g. p=23  gives PRECISION of x


e = exponent or characteristic,

 Three fields in IEEE 754 float:

32 bit IEEE 754 float (single precision): 64 bit (double precision):

X= S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF X= S EE…EEEE FFFFFFFFF…FFFFFFFFFFF


01 8 9 31 01 11 12 63
1.4 Floating Point Arithmetic
1.4.2 IEEE Standard for single precision (for base 2 only) 32 bit (single precision):
The value of X is: X= S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
01 8 9 31
• If “E”=255 & F ≠ 0, then X =NaN ("Not a number")
• If “E”=255 & F = 0, and S = 1, then X = -∞; (-1)S =-1.
• If “E”=255 & F = 0, and S = 0, then X = +∞; (-1)S =1.
• If 0<“E”<255 then X =(-1)**S * 2 ** (E -127) * (1.F)
where "1.F" is intended to represent the binary number created
by prefixing F with an implicit leading 1 (d0=1) and a binary point.

• To store “E”, the exponent is stored with 127 added to it  "biased with 127".
none of the 8 bits is used to store the sign of the exponent E. Thus for single precision,
• Thus the actual exponent e is equal to “E” – 127.
• Since “E”=255 is for X =NaN, the largest “E” is 254  U= 254-127 = 127 -126=L ≤ e ≤ U=127
• If “E”=0 and F ≠ 0, then X =(-1)** S * 2 ** (-126) * (0.F)
These are "unnormalized" values. Thus for DOUBLE precision,
That is why L = -126 for the lower bound of e. -1022 ≤ e ≤ U=1023
• If “E”=0, F = 0, and S = 1, then V=-0
• If “E”=0, F = 0, and S = 0, then V=0
IEEE Standard
S EE…EEEE FFFFFFFFF…FFFFFFFFFFF
01 11 12 63

B m L U Total
(mantisa) Length

IEEE Single P 2 23 -126 127 32

IEEE Double P 2 52 -1022 1023 64  Matlab


• When prefixing F with an implicit leading 1 (d0=1) is used, the smallest positive number is

xL = 1. (000...0)2 2L = 2-1022 = 2.225x10-308 = smallest positive number for 64-bit


o Matlab “realmin” = 2. 225x10-308
o In Matlab, when x < 2.23x10-308 needs to be represented, prefixing F is “temporarily” abandoned so that
xL = 0. (000...1)2 2L = 2-1022-52 = 10-324+0.693 = 4.94X10-324

• Largest positive number x with prefixing F:


xU = 1. (111…1)2 21023 = (1+1-2-52)* 21023 ~ 21024 = 1.7976X10308 = largest positive number
o Matlab “realmax” = 1.797693134862316e+308
Rounding vs Chopping

Rounding:
e.g. xT = 2/3 = 0.66666666...
To keep 4 decimal points,
rounding to nearest → xA =0.6667  5th 6 > 5/10 (half of the base—10)
 added 1 to 4th 6

* Example: Rounding pT = 3.1415926535897932... to 5 decimals.


Solution: Since the 6th decimal number is 2<5 
pA = 3.14159
then |roundoff error| = |pA - pT| = 0.00000265358979...
< 0.000005

* Example: Rounding xT = 25.24685 to 3 decimals.


Solution: Since the 4th decimal number is 8>5  add 1 to the 3rd decimal number
 xA = 25.247
Rounding vs Chopping

Chopping:
e.g. x = 2/3 = 0.666666666...
To keep 4 decimal points,
chopping → xA =0.6666
i.e. chop everything off after 4th decimal.
* Example: Chopping xT = 25.24685 to 3 decimals
Solution: xA = 25.246 regardless the magnitude of the next number “8”.

Chopping is faster but less accurate.


Error in chopping is always non-negative since the chopped number is always no larger than the original
number.
𝑀
This can cause skew in summation of ෍ 𝑥𝑗
𝑗=1

It greatly affected the index value of Vancouver Stock Exchange in 1982.


We use rounding instead.
Machine precision or machine epsilon e or unit round d
The smallest positive number for which
fl (1 + e ) > 1; “fl”=floating point operation
For any x< e, we have fl (1 + x) = 1
Matlab: eps= 2.220446049250313e-016

Example 1: x=8*109, y= 2*10-7; if e =2.22*10-16


Question: can x+ y on this computer give accurate result?
Solution: x+ y = 8*109+2*10-7 =8*108(1+2.5*10-17)
fl (1 + e ) > 1?
Since 2.5*10-17 <e =2.22*10-16
y cannot be added to x.
Result of addition is not accurate

Note: x+ y = x(1+ y /x)


because x is larger.

What about x=10100, y= 1080; if e =2.22*10-16 ? x+ y not accurate


Machine precision or machine epsilon e or unit round d
The smallest positive number for which
fl (1 + e ) > 1; “fl”=floating point operation
For any x< e, we have fl (1 + x) = 1
• Matlab: eps= 2.220446049250313e-016 = 2-52

• In Excel: e = 1.67E-15
n x=1/2^n y=1+x y-1
45 2.8422E-14 1.0000000000000300 2.8422E-14
46 1.4211E-14 1.0000000000000100 1.4211E-14
47 7.1054E-15 1.0000000000000100 7.1054E-15

48 3.5527E-15 1.0000000000000000 3.5527E-15


49 1.7764E-15 1.0000000000000000 1.7764E-15
1.67000E-15 1.0000000000000000 1.7764E-15

1.66000E-15 1.0000000000000000 0.00000E+00

50 8.8818E-16 1.0000000000000000 0.0000E+00


Lecture 4
Errors

Significant Digits

Propagation of Errors
1.5 Significant Digits
Definition: XA has m significant digits w.r.t. XT if the error |XT -XA | has magnitude ≤5 in the (m +1)th digits counting from
the left of the first non-zero digit in XT.

Examples:
1. XT = 3 . 1 7 2 8 6
1 2 3 4 56
If XA = 3.17, then | XT -XA| = 0.0 0 286 < 0. 0 0 5
1 2 3 4  m + 1= 4  m =3

If XA = 3.173, then | XT -XA | = 0.0 0 0 14 < 0. 0 0 0 5


1 2 3 4 5  m +1 = 5  m =4
2. XT = 3 8 9. 6 7 4
1 2 3 4 5 6
If XA = 3 8 9. 7 8, then |XT -XA| = 0. 1 0 6 < 0. 5
3 4  m + 1 = 4  m =3
If XA = 3 8 9. 7, then |XT -XA| = 0. 0 2 6 < 0 . 05
3 4 5  m + 1 = 5  m =4
1.6 Truncation + Roundoff Errors

Example: Truncation error & roundoff error in computing f ′(x0) using forward difference method

Forward difference method: f ′(x0, dx) = [f(x0 +dx) – f(x0)] / dx

Consider f(x) = ex at x0=2; f(2)=e2, f’(2)= f”(2)= e2


x0 +dx = 2 + dx

>> dx=logspace(-15,-0.5,20001);
Matlab results
>> fp=(exp(2+dx)-exp(2))./dx;

>> err=(abs(fp-exp(2)))';

>> plot(log10(dx),log10(err))

Log10(err)

Log10(dx)
Truncation + Roundoff Errors

Example: Truncation error vs roundoff error in computing f ′(0)

Forward difference method: f ′(x, dx) = [f(x +dx) – f(x)] / dx dx: 10-15 to 10-0.5
>> fp=(exp(2+dx)-exp(2))./dx;

What contributes to the roundoff error (RE)?


exp(2+dx)-exp(2))

= exp(2)[exp(dx)-”1” ]
Total error

e1 e2
exp(2)[ e1 + e2]

Log10(err)
< exp(2)*2emach

TE ~ 1/2 f ″(x0)*(dx) = exp(2)/2*(dx)

Total error = RE + TE
~ 2emach exp(2)/dx +exp(2)/2*(dx)
Log10(dx)
1.7 Propagation of Errors

 Consider zT = xT* yT; * = algebraic operation: + - x ÷


First, computer actually uses xA instead xT due to rounding or the data itself contains error.
Second, after xA*yA is computed, computer rounds the product as
zA = fl(xA * yA). (25)
Thus, the error in the operation * is
zT - zA = xT * yT - fl(xA * yA). (26)
 Let
xT = xA + e, yT = yA + h. (27)
The error is
zT - zA = xT*yT - xA*yA + [xA*yA - fl(xA*yA)] (28)

The second part in [ … ] is simply due to machine rounding.


1 1−𝑝
It can be easily estimated as ≤ (xA* yA) emach = xA* yA 𝐵
2
The first part “xT*yT - xA*yA” is the propagated error
1.7.1 Error in multiplication

Absolute error in multiplication:


xT yT - xA yA = xT yT - (xT - e) (yT - h)
= xT h + yT e – eh (29)
𝑥𝑇 𝑦𝑇 − 𝑥𝐴 𝑦𝐴 𝜂 𝜀 𝜀 𝜂
Relative error: Rel.( xA yA) = = + −
𝑥𝑇 𝑦𝑇 𝑦𝑇 𝑥𝑇 𝑥𝑇 𝑦𝑇 (30)
𝜀 h
Assuming 𝑥 «1, 𝑦 «1, we obtain
𝑇 𝑇
𝜂 𝜀
Rel.( xA yA)  𝑦 +
𝑥𝑇 = Rel.( xA) + Rel.( yA). (31)
𝑇

1.7.2 Error in division


Absolute error in division:
xT/ yT - xA / yA = xT/ yT - (xT - e)/( yT - h). (32)

Relative error in division:


𝑥 Τ𝑦 − 𝑥𝐴 Τ𝑦𝐴 𝑥𝐴 𝑦𝑇 1 − Re𝑙. (𝑥𝐴 ሻ
Rel.(xA/yA) = 𝑇 𝑇 =1− =1−
𝑥𝑇 Τ𝑦𝑇 𝑥𝑇 𝑦𝐴 1 − Re𝑙. (𝑦𝐴 ሻ

 1- 1 − Re𝑙. (𝑥𝐴 ሻ + Re𝑙. (𝑦𝐴 ሻ+. . .  Rel.( xA) - Rel.( yA)


𝜂 𝜀
=𝑦 −𝑥
𝑇 𝑇
1.7.3 Error in addition:

Absolute error: xT + yT - (xA + yA) = e + h (34)


𝜀+𝜂
Relative error: Rel. (xA + yA) =
𝑥𝑇 + 𝑦𝑇 (35)

1.7.4 Error in subtraction:


Absolute error: xT - yT - (xA - yA) = e - h (36)
𝜀−𝜂
Relative error: Rel. (xA - yA) = 𝑥 − 𝑦 (37)
𝑇 𝑇

Note: xT ± yT may be small due to cancellation


→ large Rel.( xA ± yA).

i.e. loss of significance due to subtraction of nearly


equal quantities--- very important practical issue!
Example of Error in subtraction:

Compute r = 13 - 168 (= x – y) .

Using 5-digit decimals, y = 168  yA = 12.961 => rA = 0.039


Exact number: rT = 0.038518603...
 Error(rA) = 0.038518603… - 0.039 = -0.00048.
or Rel. (rA ) = -1.25x10-2 which is not small.
Reason: x = 13 and y = 168 are quite close  rA has only 2 significant digits after subtraction.
132 − 168 1 1
Improvement: 𝑟𝐴 = = = = 0.038519 with 5 significant digits.
13 + 168 13+ 168 13+12.961
0.038518603. . . −0.038519
 Rel. (rA ) = = -1.03x10-5
0.038518603. . .
the magnitude of this error is much smaller than the previous one: 1.25x10-2.

Lesson: avoid subtraction of two close numbers! Whenever you can, use double precision.
1.7.5 Induced error in evaluating functions

 With one variable:


If f(x) has a continuous first order derivative in [a, b], and xT and xA are in [a, b],

f(xT) - f(xA)  f ′(xA) (xT - xA) + o(xT - xA) (38)

 With two variables:

f(xT, yT) - f(xA, yA)  𝑓𝑥′ (xA, yA) (xT − xA) +𝑓𝑦′ (xA, yA) (xT − xA) + o(xT - xA, xT - xA) (39)

* Example: f(x, y) = xy  𝑓𝑥′ =yxy-1, 𝑓𝑦′ = xy logx

 Error(fA)  𝑦𝐴 𝑥𝐴 𝑦𝐴 −1
Error(xA) + 𝑥𝐴 𝑦𝐴
log(xA) Error(yA)

 Rel.(fA)  yA [Rel.( xA) + log xA Rel.( yA)]


1.7.6 Error in summation

𝑀
Consider S=෍ 𝑥𝑗 (40)
𝑗=1

Fortran program S=0


DO J = 1 TO M
S = S + X(J)
ENDDO
Equivalently, in the above code we are doing the following:

s2 = fl( x1 + x2) = (x1 + x2) (1 + e2); (41a)


where e2 = machine error due to rounding
s3 = fl(x3 + s2) = (s2 + x3) (1 + e3) (41b)
= [(x1 + x2) (1 + e2) + x3] (1 + e3)

 (x1 + x2+ x3 ) + e2 (x1 + x2) + e3(x1 + x2+ x3) (41c)

sk+1 = (sk + xk+1) (1 + ek+1) = (x1 + x2+ x3 +... xk+1) + e2(x1 + x2) + e3(x1 + x2+ x3) + ...
+ (x1 + x2+ x3+... +xk+1) ek+1 (41d)
1.7.6 Error in summation

Error =s - (x1 + x2+ x3+... +xM )


= e2(x1 + x2) + e3(x1 + x2+ x3) + ...+ (x1 + x2+ x3+... + xM)eM
= x1(e2 + e3 +... + eM) + x2 (e3+ e4 +... e M) + ... + xM eM (42)

Since all ei's are of same magnitude


=> term x1 contributes the most while xM contributes the smallest;
=> we should add from smallest (x1) to the largest (xM)
to reduce the overall machine error accumulation.
1
Example: Compute 𝑆(𝑀ሻ = σ𝑀
𝑘=1 for M < 108
𝐾
M
1
i) summing from k=1 to M using single precision (single: large to small) k
k =1

ii) summing from k=M to 1 using single precision (single: small to large)
M
1
iii) summing from k=1 to M using double precision (double: large to small) k
k =1

iv) summing from k=M to 1 using double precision (double: small to large)
asymptote = ln(M)+ 0.5772156649015328
M SP: large to small SP: small to large DP: large to small DP: small to large asymptote
16384 10.2813063 10.28131294 10.2813068 10.28130678 10.2812767
32768 10.9744091 10.97444344 10.9744387 10.9744387 10.9744225
65536 11.667428 11.66758823 11.6675783 11.66757825 11.6675701
131072 12.3600855 12.36073208 12.3607216 12.36072161 12.3607178
262144 13.0513039 13.05388069 13.0538669 13.05386689 13.0538654
524288 13.7370176 13.74705601 13.7470131 13.74701311 13.7470112
1048576 14.4036837 14.44023132 14.4401598 14.44015982 14.4401588
2097152 15.4036827 15.13289833 15.1333068 15.13330676 15.1333065
4194304 15.4036827 15.82960701 15.8264538 15.82645382 15.8264542
8388608 15.4036827 16.51415253 16.5196009 16.51960094 16.5195999
16777216 15.4036827 17.23270798 17.2127481 17.21274809 17.2127476
18
Sum
17

16

15

14 single: large to small


single: small to large
13
dbl: large to small
12 dbl: small to large
11 asympt
M
10
0

2000000

4000000

6000000

8000000

10000000

12000000

14000000

16000000

18000000
II. Root Finding

Goal: Find the value of x where a function y = f(x) = 0


𝑓(𝑥 ሻ =p(x) - 1.6
Variation: find x such that
𝑝(𝑥 ሻ x=r x
p(x) = 1.6

1.6
x
x=r

Bracketing Methods Open Methods


Bisection Newton-Raphson
False Position Secant Methods
Fixed-Point Iteration
Muller's Method
Repeated roots
Supplemental Reading for EGM6341
Chapter ONE Mathematical Preliminary & Error Analysis
Objectives: Provide a sufficient number of examples that are related to:
mean value theorem;
Taylor series expansion;
asymptotics;
error analyses;
acceleration of convergence

Example 1:
(This example illustrates how the intermediate value theorem is used to
derive other important results of interest.)
Given the Intermediate value theorem:
Let f(x) be continuous function on the finite interval a≤x≤b, and define
m = Infimum f(x), M = Supremum f(x)
a≤x≤b a≤x≤b
Then for any number z in the interval [m, M], there is at least one point  in
[a, b] for which
f() = z.
Prove the Integral Mean Value Theorem:
Let w(x) be nonnegative and integrable on [a, b] and let f(x) be continuous
on [a, b]. Then
b b
 w(x) f(x) dx = f()  w(x) dx (*)
a a

for some  in [a, b].


Proof:
Since f(x) is continuous in [a, b], there must exist a supremum
M and an infimum m so that
mf(x) M for x in [a, b].
Thus,
m w(x)  f(x) w(x)  M w(x)
Integrating the above inequality from a to b, we get
b b b
m  w(x) dx   w(x) f(x) dx  M  w(x) dx
a a a

b
If  w(x) dx = 0, then (*) is valid for any  in [a, b].
a

b
If  w(x) dx  0, then
a

b b
m   w(x) f(x) dx /  w(x) dx  M
a a

Since mf(x) M is a continuous function, there must be a point  in [a, b]

b b
so that f() =  w(x) f(x) dx /  w(x) dx.
a a

b
Multiply  w(x) dx on both sides we obtain (*).
a

(Similar approach is used in solving Probs. 1 & 8b on pp. 43-45 of the


textbook)
Example 2 Application of Taylor series expansion for finding limit
(This example illustrates how Taylor series expansion can be used to find
the limit of a complicated function that is difficult to handle using other
techniques)

Using Taylor series expansion to find the following limit


1 x +1 1
lim x 2 [(1 + ) − (1 + ) x ] = ?
x → x +1 x

x 1
Solution 1: factoring (1 + ) out first
x
 1 x +1 
(1 + )
2 1 x +1 1 x 2 1 x x + 1 
lim x [(1 + ) − (1 + ) ] = lim x (1 + )  − 1
x → x +1 x x → x  (1 + 1 ) x 
 x 
x +1 x
1 x  ( x + 2) x 
2
= lim x (1 + )  − 1
x → x  ( x + 1) 2 x +1 
Let y=x+1, then the square bracket term in the above expression becomes
 x +1 x 
lim  ( x + 2) x − 1
2 x +1
x →   ( x + 1) 

 ( y + 1) y ( y − 1) y −1 
= lim  − 1
y →   ( y ) 2 y −1 
 1  y  1  y y 

= lim 1 +  1 −  − 1
y →   y  y  y −1 
 
 
y 
 
= lim 1 −
1  y
− 1

y →  
y  y − 1 
2
 
 y y ( y − 1) 1 y ( y − 1)( y − 2) 1  y 
= lim 1 − + − +  − 1
y →   2 2! 4 3! 6  y −1 

 y y y 

 1 1  1 1  
= lim 1 − + + ...1 + + + ... − 1
y →   y 2! y 2 
 y y2  
 

 1   1 
= lim 1 + −  − 1 = lim  − 
y →  2y2  y →  2 y 2
 

 1 1 
= lim  − + 
x →   2( x + 1)2 6( x + 1)3 

Hence,

1 x +1 1 x 1 x  ( x + 2) x +1 x x 
lim x [(1 +
2
) − (1 + ) ] = lim x (1 + ) 
2
− 1
x → x +1 x x → x  ( x + 1) 2 x +1 
1 x  x2 x2 
= lim (1 + )  − +  
x → x  2(x + 1)2 6( x + 1)3 
e
=
2

1 x
Solution 2: directly expanding (1 + )
x
Note: The standard binomial expansion gives:
1 x 1 x( x − 1) 1 x( x − 1)( x − 2) 1 x( x − 1)...( x − n + 1) 1
(1 + ) x = 1 + + + + ... + ...
x 1! x 2! x 2 3! x 3 n ! x n
1 x +1 x +1 1 ( x + 1) x 1 ( x + 1) x( x − 1) 1
(1 + ) = 1+ + + + ...
x +1 1! x + 1 2! ( x + 1) 2 3! ( x + 1) 3

( x + 1) x...( x − n + 2) 1
+ + ...
n! ( x + 1) n
Subtracting the above two expressions give
1 x +1 1
(1 + ) − (1 + ) x ] =
x +1 x
( x + 1) x ( x + 1) x( x − 1)
+ ... ( x + 1) x...( x − n + 2) 1 + ...
1 1
+
2! ( x + 1) 2 3
3! ( x + 1) n! ( x + 1) n

x( x − 1) 1 x( x − 1)( x − 2) 1
- − − ... x( x − 1)...( x − n + 1) 1 + ...
2 3
2! x 3! x n! xn
x x − 1 x( x − 1) 1 ( x − 1)( x − 2) 1
= − + −
2( x + 1) 2 x 6 ( x + 1) 2 6 x2
( x + 1) x...( x − n + 2) 1 x( x − 1)...( x − n + 1) 1
+ − + ...
n! ( x + 1) n n! xn
1 ( x − 1)(3x + 2) ( x − 1)( x − 2)...( x − n + 2) n − 2
= + + ... + [x + .h.o.t.]
2 x( x + 1) 2
6 x ( x + 1) 2
2(n − 2)! x n −1 ( x + 1) n −1

where “h.o.t.” = terms of smaller than O( x n − 2 ) for large x.


In the limit of x →  , the above becomes
1 1 1 1 e
[1 + 1 + + ... + + ...] =
2x 2 2! 3! n! 2x 2
Thus,
1 x +1 1 e e
lim x 2 [(1 + ) − (1 + ) x ] = lim x 2 [ ]=
x → x +1 x x → 2x 2 2

Comment:
• The first approach is clearly easier than the 2nd approach because the
2nd approach involve summing the contribution from an infinite series
while the first approach involves only the first three non-zero terms in
y
 1  y
the Taylor series expansion of 1 − 2  and .
 y  y − 1

• This problem is difficult to solve using other techniques.


1 x +1 1
Numerical experiment: f ( x) = x 2 [(1 + ) − (1 + ) x ]
x +1 x
x f(x) e/2
10 1.04565518 1.35914091
20 1.186203513 1.35914091
40 1.268014636 1.35914091
80 1.312324033 1.35914091
160 1.335406887 1.35914091
320 1.347190903 1.35914091
640 1.353145019 1.35914091
1280 1.356138192 1.35914091
2560 1.357638172 1.35914091
5120 1.358381985 1.35914091
10240 1.359002572 1.35914091
20480 1.361107267 1.35914091
40960 1.339764148 1.35914091
81920 1.119536161 1.35914091
163840 1.874947548 1.35914091
327680 2.788066864 1.35914091
655360 -43.78414154 1.35914091
1310720 -715.4510498 1.35914091
• We certainly hope that as x becomes large, we will get the desired limit.
• The closest answer we can get from this “brute force” calculation is 1.35900
but we will not be able to know the most accurate one in general without
knowing the for actual answer in advance.
• As you can see, plugging a large value of x into the function f(x) does not
produce satisfactory result. The error is caused by roundoff error in
1 x +1 1
subtracting two close numbers, (1 + ) − (1 + ) x , as x increases.
x +1 x

Example 3 Application of Taylor series expansion for inequality


(This example illustrates how Taylor series can be used to prove an inequality)
Using appropriate number of terms of the Taylor Series to prove the following
inequality
3
 sin x  1
   cos x for 0  x   .
 x  2
Solution:

1.2

1 (sinx/x)^3
cosx
0.8

0.6

0.4

0.2

-0.2
0 0.5 1 1.5 x

For sin(x), expand the first 3 terms using Taylor series expansion at
x=0 gives
1 3 1 5 1
sin x = x − x + x -…> x − x 3
6 120 6

Hence, (sin x / x )
3
( 2
)
3
 1 − x / 3! = 1 −
x2 x4 x6
+ −
2 12 216
.

On the other hand,


1 2 1 4 1 6 1 8
cos x = 1 − x + x − x + x − … (TS expansion)
2! 4! 6! 8!
1 2 1 4 1 6 1 8
 1− x + x − x + x (only 5 terms)
2! 4! 6! 8!
Thus it is sufficient to prove that

x2 x4 x6 1 1 1 1
1− + −  1 − x 2 + x 4 − x 6 + x8
2 12 216 2! 4! 6! 8!
1 1 1 2 1 4
or + (− + )x − x  0 .
4! 216 720 8!
In the above inequality, the left hand side is decreasing on the interval
(0, /2] and the minimum value is attained at x= /2.
1 1 1  2 1  4
Furthermore, at x= /2, + (− + )( ) − ( )  0 because
4! 216 720 2 8! 2
1 1 4 2 1 4 4
+ (− )( ) − ( )  0 .
24 216 2 8! 2
3
 sin x 
Thus we have proved    cos x (the equality holds at x=0) for
 x 
1
0 x .
2
Example 4 Illustration complicated Taylor series expansion
(This example illustrates how to perform Taylor series expansion for not so
common functions using the expansion for common functions)
Using Taylor series expansion to obtain the first 4 NON-ZERO terms for small :
1
i. x() = 1 +  2
4

1 1 2
ii. x() = 1 −  (1 −  +  )
2 8
(In some applications, we can find exact solution in terms of small
parameters  such as given above. It would be easier to understand the
solution if it is expressed in power series.)
Solution:
* We note that Taylor series expansion for f(z) = (1+z) (|z|«1) is
 ( − 1)  ( − 1)( − 2)
f(z) = (1+z) = 1 + z + z2 + z 3 + ...
2! 3!
* We will apply the above result (binomial expansion) to both problems.
1 2 1 2
i. For 1 +  , we note that if we let z   and  =1/2
4 4
we can directly apply the Taylor series expansion for f(z) = (1+z)
Thus,
1 1 1 11 1 1 11 1 1 1
1+  2 = 1+ (  2) + ( − 1)(  2 ) 2 + ( − 1)( − 2)(  2 ) 3 + ...
4 2 4 2! 2 2 4 3! 2 2 2 4
1 2 1 4 1 6
= 1+  −  +  + ... (note:  =1/2, z = /4)
8 128 1024
Comment: the important thing is that we have avoided the tedious
1 2
procedures of evaluating the derivatives of f(x) = 1 + x
4
1 1
ii. For f() = 1 −  (1 −  +  2 ) , it is clear that the evaluation of the high
2 8

order derivatives of f() with respect to  is very tedious and it is


prone to error. We should avoid such a procedure if we can.
1 1 2
The trick is to note that if we let z= −  (1 −  +  ) we can apply the
2 8
binomial expansion immediately for =1/2. That is
1 1 1 1 1 11 1 1 1
1 −  (1 −  +  2 ) =1+ [− (1 −  +  2 )] + ( − 1)[− (1 −  +  2 )]2
2 8 2 2 8 2! 2 2 2 8
11 1 1 1 1
+ ( − 1)( − 2)[− (1 −  +  2 )]3 + ...
3! 2 2 2 2 8
Since we have included up to O(z) term in the above, we would normally
expect the expansion to give 4 terms involving , , , and 
Let’s call this as our first attempt; and it gives
1 1 1 1 2 1 3 1 2
1 −  (1 −  +  2 ) = 1+ (−  +  −  ) - [ (1 −  + ...)]
2 8 2 4 16 8
1
+ [− 3 + ...]
16
1 1
= 1 −  +  − 0 + ...
2 3
2 8
Since we are looking for the first 4 NON-ZERO terms, we naturally started
by only keeping 1, , , &  terms.
However, we found that the coefficients of O() term is exactly zero.
Thus the above procedure only gives 3 NON-ZERO terms.
The error in the above expansion is O().
To obtain the 4th NON-ZERO term, we have to keep more terms in the
above expansion.
Let’s call the procedure below as our second attempt.
1 1 1 1 2 1 3 1 1
1 −  (1 −  +  2 ) =1+ (−  +  −  ) - [  2 (1 −  +  2 ...)]
2 8 2 4 16 8 2
1 3 3 11 1 1 1 1 1
− [ (1 −  + ...)] + ( − 1)( − 2)( − 3)[− (1 −  +  2 )]4 +…
16 2 4! 2 2 2 2 2 8
1 1 2 1 4
= 1 −  +  − 0 −  + ... (must keep ALL O() terms now)
3
2 8 128

1.05 0.006
f( ) 1
0.005
0.95
0.9 exact 0.004
0.85 4-term expansion
0.003
0.8 error/eps^5
0.75 0.002
0.7
0.001
0.65
0.6 0
0 0.2 0.4 0.6 0.8 1 1.2

1 1
Fig. S1.1 Taylor series expansion for f() = 1 −  (1 −  +  2 ) .
2 8
Note:
i) As you can see from the above plot, comparison of the expansion with the
exact expression is very good.
ii) We can refine the above analysis to find the error in the above expansion to
be about /256.
1 1 1 1 2 1 4
iii) The actual error E()=| 1 −  (1 −  +  2 ) -( 1 −  +  −  + ... )| is
2 8 2 8 128
computed for  between 0 and 1. Then E()/ is plotted (blue thin line). As
can be seen, this ratio is very close to 1/256~0.0039 for small , indicating
that the analysis for the O() term is also correct.

Example 5 Application of Taylor series expansion to solve nonlinear equation


(This example illustrates how to use Taylor series expansion to solve a
transcendental equation)
It is required to determine the symmetrically placed pair of non-zero roots of
the equation
sinh(x) = cx (1)
where c>1 is a real constant. Show that, with the abbreviation of
s = 6(c-1), t=x2 (2a,b)
the problem can be considered as that of inverting the series
3! 2 3! 3 3! 4
s= t+ t + t + t + ... (3)
5! 7! 9!
and deduce the expansion
1 2 2 3 13 4
x2 = s − s + s − s + ... (4)
20 525 37800
Solution:
First, we note that from the Taylor series expansion for sinh(x) we can obtain
7
x3 x5 x x9
sinh(x) = x + + + + +…= cx (5)
3! 5! 7! 9!
7
x3 x5 x x9
Thus, (c-1)x = + + + … (6)
3! 5! 7! 9!
Dividing by x/3! on both sides, we obtain
6
23! x 4 3! x 3! x8
s = 6(c-1) = x + + + +…
5! 7! 9!
3! 2 3! 3 3! 4
= t+ t + t + t + ... (7)
5! 7! 9!
In order to solve for t in terms of s, we express t in the form of power series
expansion as a function of s,

t = a0 + a1s + a2 s 2 + a3 s 3 + a4 s 4 + ... (8)


in which the coefficients (a0, a1, …a4) are to be determined.
Substituting Eqn. (8) into the right hand side of Eqn. (7) we obtain

s = a0 + a1s + a2 s 2 + a3 s 3 + a4 s 4 + ...

+
1
20
(
a0 + a1s + a2 s 2 + a3 s 3 + a4 s 4 + ...
2
)
+
3!
7!
(a0 + a1s + a2 s 2 + a3 s 3 + a4 s 4 + ...)3

+
3!
9!
( )4
a0 + a1s + a2 s 2 + a3 s 3 + a4 s 4 + ... +… (9)

Since the above must hold for arbitrary s, the coefficients of various powers (s0,
s1, s2, s3, s4,…) must balance on both sides.
Since the left hand side only has a linear term, s, it is clear that
a0 = 0. (10)
Now consider the coefficient of s1 on both sides of Eqn. (9). The 2nd , 3rd, and
4th lines will each give O(s2), O(s3) and O(s4) as their leading order term.
Thus, s = a1 s gives
a1 = 1. (11)
Equation (9) can thus be rewritten as

s = s + a2 s 2 + a3 s 3 + a4 s 4 + ...

+
1
20
(
s + a2 s 2 + a3 s 3 + a4 s 4 + ...
2
)
+
3!
7!
(
s + a2 s 2 + a3 s 3 + a4 s 4 + ...)3

+
3!
9!
( )4
s + a2 s 2 + a3 s 3 + a4 s 4 + ... +… (12)

Now, consider the coefficient of s2 on both sides of Eqn. (12). We have

0 = a2 s 2 +
1
( 2
s + a2 s 2 ... +…
20
)
Thus,
a2 = -1/20 (13)
For the balance of s3 in Eqn. (12), we have
1 3!
0 = a3 s 3 + 2a2 s 3 + s 3
20 7!
Hence,
a3 = 1/200 – 3!/7! = 2/525 (14)
For the balance of s4 in Eqn. (12), we note that

(s + a2 s 2 + a3s3 + a4 s 4 + ...)3 = s3 (1 + a2 s + a3s 2 + ...)3


= s 3 (1 + 3a2 s + ...)
Thus,
1 3! 3!
0 = a4 s 4 + (a 22 + 2a3 ) s 4 + 3a2 s 3 + s 4
20 7! 9!
Hence,
1 3! 3!
a4 = - (a22 + 2a3 ) - 3a2-
20 7! 9!
= -13/37800 (15)
Finally,
1 2 2 3 13 4
x2 = s − s + s − s + ... (4)
20 525 37800

Example 6 Application of Taylor series expansion to evaluate integral


And introduction of asymptotic expansion
 −t
Evaluate the Stiltjes integral f() = 0 e /(1 + t )dt for small  (>0).

Solution:
* Two observations first:
i) We first note that the integration exists for any 0.
ii) We can also perform a numerical integration for 0 to get accurate value.
* Perform Taylor series expansion for 1/(1 + t ) first, then integrate,
we obtain
 −t −t
f() = 0 e /(1 + t )dt ~ 0 e [1 −  t + ( t ) − ( t ) + ...]dt
2 3

~ 1 -  + 2!2 - 3!3 +4!4 - ...+(-1)n n!n +…


(you can also derive the above using integration by parts)
Now we notice that the above is a divergent series with 0 radius of
convergence and one may naively abandon the above result.

But f() = 1- +22 + O(3) is a valid three term expansion for f() as →0.
It gives a good approximation to f() for small .
1.6

1.4 converging as  → 0

1.2
f( )
1

0.8

0.6

0.4 Integ N=2 N=3


N=4 N=5 N=6
0.2

0
0 0.1 0.2  0.3 0.4 0.5

 −t
Fig. S1.2 N-term asymptotic expansion for f() = 0 e /(1 + t )dt .

Fig. S1.2 shows convergence of the series in the limit of →0 for several terms
in comparison with the numerically integrated result. Clearly, at any given .,
the increase in N leads to poor accuracy.
* Further comments on asymptotic expansion
For a given , an indefinite increase in N will lead quickly to divergence.
Such an expansion is called asymptotic expansion. For an asymptotic
expansion, the magnitude of the error associated with the retention of only n
terms can be made arbitrarily small by taking a parameter x sufficiently near
a certain fixed value x0 (or sufficiently large in magnitude). The error often
first decreases as n increase but eventually increases unboundedly in
magnitude with increasing n, when x is given a fixed value other than x0. In
solving engineering and physical problems, we are often interested only in
the limit of small  and we only need to find the first few terms in the
expansion. This type of asymptotic expansion becomes extremely valuable.

0.5
Relative error
0.4

0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 N

Fig. S1.3 Behavior of divergent asymptotic series as


number of terms N increases.

Example 7 Application of Taylor series to evaluate error function

Obtain a Taylor series expansion for the error function.

2 x 2
Solution: erf ( x) =  exp(−t )dt (TS expansion for exp()=>)
 0

2 x 2 1 4 n1 2n
=  [1 − t + 2! t − ... + (−1) n! t + ...]dt
 0

2 1 1 5 1
= [ x − x3 + x − ... + (−1) n x 2n +1 + ...] (1)
 3 2!5 n!(2n + 1)
From which we can also compute the complimentary error function
2  2
erfc( x) = 1 − erf ( x) =  exp(−t )dt (2)
 x

Comments:
i) The above expansion converges for ALL x.
ii) One needs a lot of terms for large values of x. Consider x=3 below:
n Term in Eq. (1) Series
1 3 3.3851375013
2 -9 -6.7702750026
3 24.3 20.6493387578
4 -52.07142857 -38.1069764431
5 91.125 64.7165751585
6 -134.2022727 -86.7144735638
7 170.3336538 105.4864728915
8 -189.8003571 -108.6802960158
9 188.4047663 103.9117172377
10 -168.5726856 -86.3021893576
11 137.2663297 68.5862774415
12 -102.5428313 -47.1209171238
13 70.75455359 32.7170471262
14 -45.35548307 -18.4611350853
15 27.14626204 12.1701414600
16 -15.23693417 -5.0228976332
17 8.051334536 4.0620605240
18 -4.01890144 -0.4727841360
19 1.900831762 1.6720748248
20 -0.854219942 0.7081908384
21 0.365647804 1.1207802033
22 -0.149417541 0.9521805625
23 0.058408675 1.0180876948
24 -0.021882991 0.9933953835
25 0.007871178 1.0022770567
26 -0.002722502 0.9992050426
27 0.000906842 1.0002283042
28 -0.000291289 0.9998996202
29 9.03433E-05 1.0000015617
30 -2.70871E-05 0.9999709971
31 7.85971E-06 0.9999798658
32 -2.20941E-06 0.9999773728
33 6.02277E-07 0.9999780524
34 -1.59354E-07 0.9999778726
35 4.09593E-08 0.9999779188
36 -1.02357E-08 0.9999779072
37 2.48882E-09 0.9999779100
38 -5.89245E-10 0.9999779094
39 1.35933E-10 0.9999779095
40 -3.0575E-11 0.9999779095
41 6.70952E-12 0.9999779095
Thus, it takes 39 terms to obtain erf(3) with error <10-10.
If we use the above Taylor series expansion to compute erf(4), we obtain

N Term in Eq. (1) Series (x=4)


1 4 4.5135166684
2 -21.33333333 -19.5585722297
3 102.4 95.9874544809
4 -390.0952381 -344.1878853689
5 1213.62963 1025.2465052750
6 -3177.50303 -2560.1817175018
7 7169.750427 5530.0152979946
8 -14202.93418 -10496.2797422268
9 25064.00149 17785.4173875756
10 -39867.88542 -27200.6739533967
11 57713.51032 37922.0487497251
12 -76647.19157 -48565.0454330769
13 94020.55499 57525.7900978269
14 -107145.9316 -63375.4469744282
15 114007.4937 65268.2338512226
16 -113762.3163 -63098.7938973838
17 106867.6305 57488.4139876707
18 -94833.79647 -49520.0662867978
19 79740.12916 40457.0342442868
20 -63706.01412 -31427.5049114515
21 48478.72294 23274.8761046226
22 -35218.20847 -16464.6166379517
23 24474.87619 11152.3237730494
24 -16301.49015 -7241.9381103186
25 10424.08214 4520.3790123793
26 -6409.788547 -2712.2928497346
27 3795.636759 1570.6245954649
28 -2167.474729 -875.1087335109
29 1195.098848 473.4159090322
30 -637.0135297 -245.3768869931
31 328.6015148 125.4102165303
32 -164.21663 -59.8884076576
33 79.58190531 29.9101563719
34 -37.4333702 -12.3288787211
35 17.10510352 6.9721637426
36 -7.599208969 -1.6026253439
37 3.284893983 2.1039805930
38 -1.382614837 0.5438668149
39 0.567032742 1.1836947485
40 -0.22673948 0.9278466426
41 0.08845639 1.0276589901
42 -0.03368777 0.9896464123
43 0.012531473 1.0037866653
44 -0.004555681 0.9986461296
45 0.001619384 1.0004734089
46 -0.000563126 0.9998379887
47 0.000191658 1.0000542514
48 -6.38716E-05 0.9999821800
49 2.08516E-05 1.0000057084
50 -6.67113E-06 0.9999981809
51 2.09249E-06 1.0000005420
52 -6.4372E-07 0.9999998156
53 1.94295E-07 1.0000000349
54 -5.75587E-08 0.9999999699
55 1.67415E-08 0.9999999888
56 -4.7825E-09 0.9999999834
57 1.34225E-09 0.9999999849
58 -3.70218E-10 0.9999999845
59 1.00383E-10 0.9999999846
60 -2.67651E-11 0.9999999846
And it gives the complimentary error function ercf(4)=1.53946E-8
For x=5, the result may be wrong:
N Term in Eq. (1) Series (x=5)
1 5 5.6418958355
2 -41.66666667 -41.3739027935
3 312.5 311.2445869238
4 -1860.119048 -1787.6749947270
5 9042.24537 8415.4063049647
6 -36991.00379 -33324.4717392289
7 130417.0005 113835.3546986330
8 -403671.6683 -341659.3461804640
9 1113065.262 914300.3069788110
10 -2766390.271 -2207236.84268605
11 6257311.327 4853382.90060352
12 -12984539.31 -9798100.75642030
13 24887033.69 18283909.58620870
14 -44314518.67 -31719670.08371470
15 73675616.02 51414360.15569000
16 -114870584.1 -78203213.87348940
17 168607391.5 112049854.06706700
18 -233783357.9 -151746416.60681400
19 307147805.1 194832767.83709900
20 -383416625.7 -237806564.97075500
21 455891719.6 276612153.91663300
22 -517485063.8 -307307211.37635900
23 561915599.6 326746644.87613200
24 -584787279.9 -333115138.92040200
25 584290011.8 326185537.91712100
26 -561376678 -307260210.41696900
27 519416011.9 278837996.49592300
28 -463451997.2 -244111582.06271900
29 399277253.2 206424552.34713900
30 -332536578.5 -168802795.13038700
31 268028116.6 133634547.78182600
32 -209289747.9 -102523643.68828400
33 158476612 76298063.79509270
34 -116474217.3 -55129016.55790910
35 83160406.67 38707453.85610550
36 -57727042.86 -26430538.68481010
37 38989916.7 17564871.04126490
38 -25642017.29 -11369047.06687450
39 16431572.94 7171997.51711779
40 -10266399.84 -4412394.18174139
41 6258067.803 2649079.15321135
42 -3723945.725 -1552943.62177522
43 2164478.257 889408.55136123
44 -1229488.451 -497920.60338749
45 682874.7145 272620.99813919
46 -371036.9328 -146049.34701267
47 197313.9369 76595.5887564901
48 -102744.6592 -39339.3442028159
49 52409.48573 19798.4276567957
50 -26199.34094 -9764.362852286550
51 12840.27106 4724.331506125080
52 -6172.032061 -2240.060889646740
53 2910.802666 1044.428198102790
54 -1347.356198 -475.900466214355
55 612.330586 215.040610416355
56 -273.317092 -93.364702248459
57 119.8569738 41.879410053194
58 -51.65460733 -16.406572746527
59 21.88432008 8.287238121318
60 -9.117167959 -2.000384267009
61 3.73602957 2.215273667602
62 -1.506262755 0.515638154847
63 0.59764619 1.190009664747
64 -0.233426365 0.926616217873
65 0.089768497 1.027909119257
66 -0.033999225 0.989545101997
67 0.012684833 1.003858403037
68 -0.004663026 0.998596741792
69 0.001689321 1.000502936157
70 -0.000603266 0.999822223437
71 0.000212396 1.000061886730
72 -7.37414E-05 0.999978678505
73 2.52515E-05 1.000007171743
74 -8.53011E-06 0.999997546546
75 2.84311E-06 1.000000754654
76 -9.35152E-07 0.999999699448
77 3.03595E-07 1.000000042018
78 -9.72978E-08 0.999999932229
79 3.07879E-08 0.999999966970
80 -9.62046E-09 0.999999956114
81 2.96905E-09 0.999999959465
82 -9.05129E-10 0.999999958443
83 2.72609E-10 0.999999958751
84 -8.11278E-11 0.999999958659
85 2.38594E-11 0.999999958686
86 -6.9354E-12 0.999999958678
87 1.9928E-12 0.999999958681
88 -5.66099E-13 0.999999958680
89 1.59006E-13 0.999999958680
90 -4.41656E-14 0.999999958680
According to this table, erf(5) is 0.999999958680 after 88 terms.
And it gives the complimentary error function erfc(5)=4.131987E-8
However it is wrong in the sense that the present erf(5) is less than erf(4).
In the next example, we will see that erfc(5) ~1.5474E-12 which is much
smaller than the one obtained here using regular Taylor series.
iii) For large values of x (say 5 or larger), the magnitude of some of the terms in
the expansion may be very large while the final result for erf(x) is of
order one so that the roundoff error can swamp the overall machine
accuracy.
iv) We also note that the alternating signs of the successive terms with large
magnitude. This often leads to cancellation of terms and large roundoff
error for the final result, erf(x).
v) Need to develop an alternative strategy for large x expansion.

Example 8 Development of asymptotic expansion for error function


Obtain an asymptotic expansion for the error function for large x.
Solution:
First we rewrite the error function as

2 x 2  2

2 
erf ( x) = 2
 exp(−t )dt =  exp(−t ) dt −  exp(−t ) dt 
 0   0 x 

2  −t 2
=1 −  e dt
 x

  −1  −1
−t 2
de − t
2
2
Now notice that  exp(−t )dt =  (−2te ) dt = 
x x 2t x 2t

1  e −t 2
− x2
(integration by parts=>) = e − dt =…
2x x 2t

If we repeatedly use the above “integration by parts” trick, we obtain


 
1 1 − x2  1 1 3 1 3  5
erf (x) = 1 − 1− + − + ... +R5
( ) ( )
e (1)
 x  2x2 2 2 2 3 
 2x 2x 
Where the remainder R5 is

e −t
2
105 
R5 =  (−2t ) 9 dt
16  x t

105 1  − t 2 105 1 − x 2
|R5| <  de = e (since t>x)
16  x 9 x 16  x 9

Comment:
i) the above expansion diverges for any x if we keep adding more terms
ii) The remainder is quite small when x is large which means that the

expansion (with 5 terms) is very useful for large values of x.

iii) According to this asymptotic expansion (neglecting R5), we get


erf(4) ~ 0.99999998458241 or erfc(4)=1.5416E-8
and erf(5) ~ 0.99999999999846 or erfc(5)=1.547437E-12

OPEN ENDED QUESTION: since keeping more terms in the asymptotic


expansion will eventually cause the error to increase while the Taylor series
expansion for large x suffers from roundoff error, how can we obtain
reliably accurate result for, say, erfc(4) ?

Example 9 Error propagation and a comprehensive error analysis


1 2 1 3
One is using e− x  1 − x + x − x (1)
2 6

to compute e −1 / 3 . Analyze how large the error is (assuming that we do

not know the exact value of e −1 / 3 )


Solution:
* There are three aspects we need to analyze:
i) Truncation error in using 4 terms in the series
ii) Roundoff in approximating 1/3
iii) Propagation of error from the roundoff error to the funciton
Let’s analyze them one by one.
i) Truncation error:
1 2 1 3
e− x  1 − x + x − x + ET ( x) (2)
2 6
Where the truncation error is of the form
1 − 4
ET ( x) = e x ( between 0 and x) (3)
24
If x is positive then so is  => exp(-) < 1
1 4
=> ET ( x)  x
24
Using (1), we can get
1 1 116
exp(-1/3)  1 − 1 / 3 + − = (4)
18 162 162
1 −4
with a truncation error between 0 and 3 = 1 / 1944  0.00051 4<0.00052.
24
ii) Roundoff error:
116 116
If is rounded to four places, we get exp(-1/3)   0.7160 with a
162 162
roundoff error no larger than 0.00005.

Thus the error in approximating exp(-1/3) by 0.7160 will be no larger than


0.00052+0.00005= 0.00057

Note, in (4), there is only one roundoff error in rounding 116/162.


If we round 1/3, 1/18, and 1/162 individually, the total roundoff error could
be as large as 0.00015 when we combine all four terms in (4). In fact, this is
often the case for computations of summations. In this case, the error could
be as large as 0.00067.
iii) What if we only know x is between 0.333 and 0.334 resulting from a
“measurement” of x value? How does this uncertainty affect the accuracy of
exp(-x)?
For f(x)=exp(-x), the derivative is f′(x)=-exp(-x) so that the error in x, x,
will cause an error in f(x) by an amount of
f′(x) x ~ -exp(-x) x
Thus if -0.0003<x<0.0007 in approximating x = 1/3,
the maximum error in f(x) caused by the uncertainty in x will be
0.716 * 0.0007  0.0005
This error will need to be added to the truncation error in using (1).

So if we use (1) for exp(-x) while x is between 0.333 and 0.334, the largest
possible error will be
0.00052+ 0.0005  0.00107
if we do not include the roundoff error in summing the 4 terms.

(1-0.334+0.5*0.3342-0.3343/6 = 0.715568 using Excel. Comparing the exact


value, the error is 0.000963 which is close to the maximum 0.00107)

If we keep 4 decimal places in summing these 4 terms, then the total error
will be
0.00052+ 3*0.00005+ 0.0005  0.00122
Example 10: Improving accuracy by avoiding loss-of-significance error
For =1.0x10-18, A=1, B=1- , the following operations will suffer loss of
accuracy:
i) (1-cos())/2
ii) 3 1+  −1
Give ways to EXACTLY evaluate the above three expression (which should
give exact result when =O(1)) while avoiding loss-of-significance errors
when  is very small.

Solution:

Observations:
We note that even if we use double precision, direct calculation on Excel
spread sheet will give (1-cos(  ))/  2= 0 and 3 1 +  − 1 = 0 because the
machine cannot distinguish 1 from cos() for =1.0x10-18 so that (1-cos())
will be treated as 0 by the machine, and 3 1 +  is treated as 1 since 1 +  is
first calculated to be 1.

i) We use the trigonometric identity 1-cos() = 2sin2(/2) so that


2
1 − cos( )  sin( / 2) 
= 2 
2  
Since sin(/2) = 5x10-19 with no loss of accuracy here, we see that

the ratio sin(/2)/  =1/2 can be calculated without any loss of accuracy.
2
1 − cos( )  sin( / 2) 
Thus, = 2  =1/2.
2  

x −1
ii) 3 x − 11 / 3 = . For x=1 +  , it becomes
2/3 1/ 3
x +x +1
x −1 
=
x 2 / 3 + x1 / 3 + 1 (1 +  ) 2 / 3 + (1 +  )1 / 3 + 1
The above is exact for any values of . When =1.0x10-18, the computer will
treat the numerator  exactly and gives a value of 3 for the denominator.
Hence the result is

= 3.3333333 x 10-19.
(1 +  ) 2/3
+ (1 +  ) 1/ 3
+1

Example 11 Acceleration of convergence


  1
Evaluate the series S =  a k =  2 using smaller number of terms.
k =1 k =1 k
 2
(The exact result is S = ; but we pretend that we do not know)
6
Solution:
Let’s first consider the following series that we know how to compute the
exact result using high school algebra approach:
  1  1 1 
C =  bk =  =   −  =1
k =1 k =1 k ( k + 1) k =1  k k + 1 

We notice that lim ak / bk =  = 1 . Thus we express S in the form of


k →
   b
S=  k a =   k  (1 −  k )ak
b +
k =1 k =1 k =1 ak
 bk
= C +  (1 −  ) ak
k =1 ak

The convergence of the latest series is faster because 1−bk/ak tends to 0


as k tends to infinity. Thus,
 k2 1  1
S = 1 +  [1 − ] = 1+ 
k =1 k (k + 1) k 2 2
k =1 k (k + 1)
The process can be repeated with this time for
 1
S* = 
k =1 k 2 (k + 1)
using
 1   1 1 1 
C = 
*
=   − + 
k =1 k (k + 1)(k + 2) k =1  2k k + 1 2(k + 2) 

1  1 1  1   1 1  1 1 1 1
=   − −  − = − * =
2 k =1  k k + 1 2 k =1  k + 1 k + 2  2 2 2 4

Thus
1   k 2 (k + 1)  1
S = 1 + +  1 −  2
4 k =1  k (k + 1)(k + 2)  k (k + 1)
5  1
= +2
2
4 k =1 k (k + 1)(k + 2)
Clearly, the last series converges much faster than the original one.
After N applications of the transformation,
N 1  1
S=  + N! 
k2
k =1
2
k =1 k (k + 1)(k + 2)...(k + N )
Now consider N=5.
5 1
 2 = 1.463611111
k =1 k
30 1
5!  = 0.1813229391
2
k =1 k (k + 1)(k + 2)...(k + 5)
Hence, S = 1.4636111111+ 0. 1813229391 = 1.6449340502

Comparing with the exact result: S=  2 / 6 = 1.6449340668


it has 8 significant digits, or Error=1.66x10-8
Using the original series, for the same number of terms used, we get
35 1
S=  = 1.61676691491
k =1 k2
with an error of 0.02816715(=  2 / 6 -1.61676691491); this error can
also be estimated roughly using
 1  1 1
Error =  ~  dx = = 0.0281690
2 2 35.5
k = 36 k 35.5 x
(where we have used rectangle rule for numerical integration
with x=1 to replace the summation. That is,
36.5 1 x 1 k + 0.5 1 x 1
 dx ~ = ;  dx ~ = ;
2
35.5 x 36 2 36 2
k − 0.5 x
2
k2 k2
 1 x x  1
=>  dx ~ + + ... = 
2 )
35.5 x2 36 2 37 2 k = 36 k

If we want to reduce the error to be about 1.66x10-8, we will simply


need 6x 107 terms in the original series. The amount of saving is thus
6x 107 /35 = 1717400 times (by using the acceleration technique).

You might also like