1 Infinite Series (30 Points) : HW1 p1.m

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

EP501 Numerical Methods Homework 1 Solution Due Sep 13, 2013 11:59 pm

1 Infinite Series (30 points)


The completed function HW1 p1.m is listed below.

1 %% EP501 HW1 problem 1


2 function [err fwd, err bkd, err appr, s1, s2] = HW1 p1(N)
3 % compare the forward and backward summation of a series
4
5 % INPUT:
6 % N − number of terms to sum
7 % OUTPUT:
8 % err fwd − true error with forward sum
9 % err bkd : true error with backward sum
10 % err appr : appxorimate error
11 % s1 : forward sum
12 % s2 : backward sum
13
14 % N=1e8;
15
16 % sum forward
17 s1=zeros(1,1,'single');
18 for i=1:N
19 s1 = s1 + 1/iˆ2;
20 end
21
22 % sum backward
23 s2=zeros(1,1,'single');
24 for i=N:−1:1
25 s2 = s2 + 1/iˆ2;
26 end
27
28 % true value
29 s0=(piˆ2/6);
30
31 err fwd = abs(s1−s0)/s0;
32 err bkd = abs(s2−s0)/s0;
33 err appr = abs(s2−s1)/s2;

For N up to 108 , the test script HW1 p1 test.m produces the following figure.

1) The backward sum is more accurate. This is because for forward sum, the starting summations are
between two numbers that have large difference in magnitude. For example, the first term is 1 and the
2nd term is 1/22 . This causes loss of significant figures. For backward sum, two numbers 1/N 2 and
1/(N − 1)2 have comparable magnitude when N is large. Therefore the loss of significant figures is
insignificant. As a result the backward sum is much accurate than the forward sum.

1
EP501 Numerical Methods Homework 1 Solution Due Sep 13, 2013 11:59 pm

2) Explain the results in the plot. Why the three errors vary differently with increasing N ?
The backward sum error decreases exponentially. This is because as more terms included, the error due
to truncation (ignored higher n terms) becomes smaller and the sum is closer to the true value.
The forward sum also decreases exponentially until about N = 104 , for the same reason as above. Beyond
N = 104 , the error due to loss of significant figures become large and it limits the further decrease in
error.
The approximate error is very small initially, because the forward sum and backward sums don’t differ
much when both errors are dominated by the same truncation error. Starting at N = 104 , the truncation
error becomes less significant and the error is dominated by loss of significant figures in the forward sum,
so the approximate error becomes the same as the true error of the forward sum.
3) MATLAB gives eps(’single’)=1.192e-07. The backward sum error can reach below 10−6 , close to this
value. It means that the backward sum is very accurate, there is no additional round off errors from the
numerical method. The forward sum error stopped decreasing at N = 104 , when the smallest term is
about 1/n2 = 10−8 , about the same as eps. This is consistent with the description above.

2 Floating Point Number Set (30 points)


The completed MATLAB function HW1 p2.m is shown below.

1 %% EP501 HW1 problem 2


2 function numlist = HW1 p2(E,M)
3 % Calculate all possible non−negative numbers that can be
4 % represented by a 7−bit floating point variable
5 %
6 % INPUT:
7 % E : number of bits for the exponent
8 % M : number of bits for the mantissa
9 %
10 % OUTPUT:
11 % numlist : a 1−d array that lists all possible non−negative numbers,
12 % sorted from smallest to largest values.
13 %
14
15 T=E+M+1; % number of total bits
16
17 b=1−2ˆ(E−1); % bias
18 emax=2ˆE−1; % maximum possible value of e
19
20 N=2ˆ(T−1); % total possible binary numbers to consider
21 numlist=zeros(N,1);
22
23 % loop through all possible numbers, excluding non−positive ones
24 for i=1:N
25
26 s = dec2bin(i−1); % binary string representing the number
27
28 % fill in 0's to make s a length−T string
29 for k=length(s)+1:T−1
30 s=['0' s];
31 end
32
33 % extract exponent part
34 e = bin2dec(s(1:E)); % exponent in decimal form
35
36 % extract mantissa part
37 f = bin2dec(s(E+1:T−1))/2ˆM;
38

2
EP501 Numerical Methods Homework 1 Solution Due Sep 13, 2013 11:59 pm

39 if (e>0 && e<emax) % normal number


40 x = 2ˆ(e+b)*(1+f);
41 elseif (e==0) % subnormal number
42 x = 2ˆ(1+b)*f;
43 else % infinity or NaN
44 if f==0
45 x = Inf;
46 else
47 x = NaN;
48 end
49 end
50 fprintf('#%02d 0 %s %s = %g\n',i,s(1:E),s(E+1:T−1),x);
51 numlist(i)=x;
52
53 end

Larger E gives a larger range of the normal number. Larger M gives more numbers for each value of e,
therefore more precision.
The number distribution is generally exponential (as shown by a straight-line) in the logarithmic plot. This
is because the numbers increase due mostly to the change in the exponent. The denormal numbers increase
linearly, because its exponent is a constant and the increase is only due to the fractional part f .

3 Maclaurin Series (15 points)


The first three nonzero terms of the Maclaurin series for the arctangent function are x − (1/3)x3 + (1/5)x5 .
Compute the absolute error and relative error in the following approximations of π using the polynomial in
place of the arctangent:
π = 3.1415926535897932

a.     
1 1
p1 = 4 arctan + arctan = 3.1415926535897932
2 3
"  3  5  3  5 #
1 1 1 1 1 1 1 1 1 1
p01 = 4 − + + − + = 3.1455761316872428
2 3 2 5 2 3 3 3 5 3

|p01 − p1 | = 0.0039834780974495
p01 − p1
= 0.1268%
p1
b.    
1 1
p2 = 16 arctan − 4 arctan = 3.1415926535897932
5 239
"  3  5 # "  3  5 #
0 1 1 1 1 1 1 1 1 1 1
p2 = 16 − + −4 − + = 3.1416210293250346
5 3 5 5 5 239 3 239 5 239

|p02 − p2 | = 0.0000283757352410
p02 − p2
= 0.0009032%
p2
The arctan approximations to π are both very accurate, indistinguishable within machine precision. The
Maclaurin series approximation has an error of about 0.13% for the first expression and 0.00090% for the
second.

3
EP501 Numerical Methods Homework 1 Solution Due Sep 13, 2013 11:59 pm

4 Rounding Arithmetic (15 points)


1 2 123 1
a. x − x+ =0
3 4 6
Exact solutions are
√ p
−b + b2 − 4ac 123/4 + (123/4)2 − 4(1/3)(1/6)
x1 = = = 92.2446 (1)
2a 2(1/3)
√ p
−b − b2 − 4ac 123/4 − (123/4)2 − 4(1/3)(1/6)
x2 = = = 0.00542037 (2)
2a 2(1/3)

The equation with four significant figures is

0.3333x2 − 30.75x + 0.1667 = 0


p p √ √
b2 − 4ac = 30.752 − 4(0.3333)(0.1667) = 945.6 − 0.2222 = 945.4 = 30.75 (3)
30.75 + 30.75
x01 = = 92.26 (4)
2(0.3333)
|x01 − x1 | = 0.0154, |x01 − x1 |/|x1 | = 0.0167% (5)

30.75 − 30.75
x2 = = 0.000 (6)
2(0.3333)
|x02 − x2 | = 0.005420, |x02 − x2 |/|x2 | = 100% (7)

Alternatively,

−2c −2(0.1667)
x001 = √ = =∞ (8)
2
b + b − 4ac −30.75 + 30.75
|x001 − x1 | = ∞, |x001 − x1 |/|x1 | = ∞% (9)

−2c −2(0.1667)
x002 = √ = = 0.005421 (10)
b− b2 − 4ac −30.75 − 30.75
|x002 − x2 | = 6.3 × 10−7 , |x002 − x2 |/|x2 | = 0.0116% (11)

b. 1.002x2 + 11.01x + 0.01265 = 0


Exact solutions are
√ p
−b + b2 − 4ac −11.01 + 11.012 − 4(1.002)(0.01265)
x1 = = = −0.001149076 (12)
2a 2(1.002)
√ p
−b − b2 − 4ac −11.01 − 11.012 − 4(1.002)(0.01265)
x2 = = = −10.9869 (13)
2a 2(1.002)

The calculation with four significant figures


p p √ √
b2 − 4ac = 11.012 − 4(1.002)(0.01265) = 121.2 − 0.05070 = 121.1 = 11.00 (14)
−11.01 + 11.00
x01 = = −0.005000 (15)
2(1.002)

4
EP501 Numerical Methods Homework 1 Solution Due Sep 13, 2013 11:59 pm

|x01 − x1 | = 0.00385, |x01 − x1 |/|x1 | = 335% (16)

−11.01 − 11.00
x02 = = −10.98 (17)
2(1.002)
|x02 − x2 | = 0.00687, |x02 − x2 |/|x2 | = 0.0626% (18)

Alternatively,

−2c −2(0.01265)
x001 = √ = = −0.001149 (19)
b+ b2 − 4ac 11.01 + 11.00
|x001 − x1 | = 7.6 × 10−8 , |x001 − x1 |/|x1 | = 0.00661% (20)

−2c −2(0.01265)
x002 = √ = = −2.530 (21)
2
b − b − 4ac 11.01 − 11.00
|x002 − x2 | = 8.46, |x002 − x2 |/|x2 | = 77.0% (22)

5 64-bit Floating-Point Numbers (10 points)


1. 0 10000001010 1001001100000000000000000000000000000000000000000000
10 3 1 10
(−1)0 22 +2 +2 −(2 −1) (1 + 2−1 + 2−4 + 2−7 + 2−8 ) = 211 (1.57421875) = 3224
2. 0 01111111111 0101001100000000000000000000000000000000000000000001
10 10
(−1)0 22 −1−(2 −1) (1 + 2−2 + 2−4 + 2−7 + 2−8 + 2−52 )
= 1.3242187500000002220446049250313080847263336181640625

You might also like