1st Machine Numbers Computer Arithmetic and Errors

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Machine Numbers

MAT307 Numerical Analysis Errors and Computer Arithmetics


Stability

MAT307 Numerical Analysis

Machine Numbers, Computer Arithmetics and Errors

Dr. Mustafa Ağgül

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Numerical analysis is concerned with solving problems


numerically; developing a sequence of numerical computations to
obtain a satisfactory result.

A significant part of this process is to figure out the errors that


arise in these computations. These errors are due to arithmetic
operations or other sources.

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Definition (Decimal Floating-point representation)


Let us consider s 6= 0 written in decimal system. Then it can be written
uniquely as
s = σ · d · 10n
where
I σ = ±1 is the sign
I 1 ≤ d < 10, the significand or mantissa
I n is an integer, the exponent

Example

126.329 = 1.26329 · 102

σ = 1, the exponent n = 2, the mantissa s = 1.26329

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Definition (Decimal Floating-point representation)


Let consider s 6= 0 written in decimal system. Then it can be written
uniquely as
s = σ · d · 10n
where
I σ = ±1 is the sign
I 1 ≤ d < 10, the significand or mantissa
I n is an integer, the exponent

Example

−726.329 = −7.26329 · 102

σ = −1, the exponent n = 2, the mantissa s = 7.26329

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Computers use binary arithmetic, representing each number as a binary


number: a finite sum of integer powers of 2.
Some numbers can be represented exactly, but others, such as 13 cannot.

Example

5.25 = 22 + 20 + 2−2
has an exact representation in finite binary (base 2) system. However,
1
= 2−2 + 2−4 + 2−6 + . . .
3
does not.
A floating-point number, the way a real number represented in
computers, approximates the real number.

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Definition (Binary floating-point representation)


Let us consider s 6= 0 written in binary system. Then it can be written
uniquely as
s = σ · d · 2n
where
I σ = ±1 is the sign
I 1 ≤ d < 2, the significand or mantissa
I n is an integer, the exponent

Example

(10101.0111)2 = (1.01010111)2 · 24

σ = +1, the exponent n = 4 = (100)2 , the mantissa s = (1.01010111)2

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Definition (Binary floating-point representation)


Let us consider s 6= 0 written in binary system. Then it can be written
uniquely as
s = σ · d · 2n
where
I σ = ±1 is the sign
I 1 ≤ d < 2, the significand or mantissa
I n is an integer, the exponent

Example

−(0.0101011)2 = −(1.01011)2 · 2−2

σ = −1, the exponent n = −2 = −(10)2 , the mantissa s = (1.01011)2

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

The IEEE floating-point arithmetic standard is the format for floating


point numbers used in almost all computers.
I The IEEE single precision (32-bit) floating-point representation of
s reserves 1 bit, for the sign, 8 bits, for the exponent, and 23 bits
for the mantissa:

x xxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx
sign(1) exponent (8) mantissa (23)

I The IEEE double precision (64-bit) floating-point representation of


s reserves 1 bit, for the sign, 11 bits, for the exponent, and 52 bits
for the mantissa:

x xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx. . . x
sign(1) exponent (11) mantissa (52)

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Single precision floating-point representation of s has a precision of 24


binary digits, and the exponent n limited by −126 ≤ n ≤ 127:

s = σ · (1.a1 a2 . . . a23 ) · 2n

σ and ai values are binary numbers already. The exponent field needs to
represent both positive and negative exponents. Exponents in the interval
[0, 126] have negative sing and those in [128, 255] have positive sign. A
bias (127) is added to the actual exponent in order to get the stored
exponent. Letting k be the number of reserved bits for the exponent, the
bias can be found by 2k−1 − 1. (28−1 − 1 = 127 in this particular case.)

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Exponents (00000000)2 = 0 and (11111111)2 = 255 are both


reserved for special characters like ±0, ±∞, N aN 1 , etc. For this
reason, the greatest and least positive floating points are as in
table below.

0 11111110 11111111111111111111111 ≈ 3.40e + 38


0 00000001 00000000000000000000000 ≈ 1.18e − 38

1
Not a Number
Dr. Mustafa Ağgül Hacettepe University
Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Example
1= 00111111100000000000000000000000 = +1.0 × 2127−127
−2 = 11000000000000000000000000000000 = −1.0 × 2128−127
0.03515625 = 00111101000100000000000000000000 = +1.125 × 2122−127
264.03125 = 01000011100001000000010000000000 = +1.0313720703125 × 2135−127

I The first bit is 0 making the number positive.


I The exponent,

(01111010)2 − 127 = 1 · 21 + 1 · 23 + 1 · 24 + 1 · 25 + 1 · 26 − 127 = −5

I The mantissa,

(00100000000000000000000)2 + 1 = 1 · 2−3 + 1 = 1.125


Thus, the floating point number 0 01111010 00100000000000000000000
corresponds to the real +1.125 × 2−5 = 0.03515625

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Infinitely many real numbers are represented by finitely many floating


points in a computer. Let say that a number s has a mantissa
d = 1.a1 a2 . . . an−1 an an+1 but the floating-point representation may
contain only n binary digits. Then s must be shortened when stored.

Definition
We denote the machine floating-point representation of s by f l(s).
I truncate or chop d to n binary digits, ignoring the remaining digits
I round d to n binary digits, based on the size of the part of d
following digit n:
a) if n + 1st digit is 0, chop d to n digits
b) if n + 1st digit is 1, chop d to n digits and add 1 to the last
digit of the result

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

It can be shown that


f l(s) = s · (1 + )
where  is a small number depending of s.
a) If chopping is used
−2−n+1 ≤  ≤ 0
b) If rounding is used
−2−n ≤  ≤ 2−n
Due to chopping:
I the worst possible error is twice as large as when rounding is used
I the sign of the error s − f l(s) is the same as the sign of s. There is
no possibility of cancellation of errors.
Due to rounding:
I the worst possible error is only half as large as when chopping is used
I the error s − f l(s) is negative for only half the cases, which leads to
better error propagation behavior
Dr. Mustafa Ağgül Hacettepe University
Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Definition
I The error in an approximated quantity is defined as

E(sA ) = |sT − sA |

where sT = true value, sA = approximate value. This is also called


absolute error.
I The relative error Rel(sA ) is the ratio of the error and true value
error s − s
T A
Rel(sA ) = =

true value sT

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

The notion of relative error is a more intrinsic error measure.


Example
I The exact distance between 2 cities: d1T = 100km and the
measured distance is d1A = 99km
E(d1A ) = d1T − d1A = 1km
E(d1A )
Rel(d1A ) = = 0.01 = 1%
d1T
I The exact distance between 2 cities: d2T = 2km and the
measured distance is d2A = 1km
E(d2A ) = d2T − d2A = 1km

E(d2A )
Rel(d2A ) = = 0.50 = 50%
d2T

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

When doing calculations with numbers that contain an error, the result
will be effected by these errors. This phenomenon is called the
propagation of error. For example, let

x = f l(x) + E(x) and y = f l(y) + E(y).

|xy − f l(xy)| |xy − (x − E(x))(y − E(y))|


=
|xy| |xy|
(1.1)
|yE(x) + xE(y) + E(x)E(y)|
=
|xy|
If E(x) << 1 and E(y) << 1, then E(x)E(y) can be ignored. Therefore,

|xy − f l(xy)| |E(x)| |E(y)|


Rel(xy) = ≈ + = Rel(x) + Rel(y).
|xy| |x| |y|

Remark
Rel(xy) = Rel(x) + Rel(y).

Dr. Mustafa Ağgül Hacettepe University


Machine Numbers
MAT307 Numerical Analysis Errors and Computer Arithmetics
Stability

Some numerical methods may tolerate the small fluctuations (errors) in


the input data or due to the computation itself; others might magnify
them to deviate the computed solution from the exact solution
exponentially fast. Calculations that can be proven not to magnify
approximation errors are called numerically stable.

Example
Finding the roots of ax2 + bx + c = 0 by quadratic formula
√ √
−b + b2 − 4ac −b − b2 − 4ac
x1 = and x2 =
2a 2a
is unstable if b2 >> |4ac|.

I if b ≤ 0, cancellation occurs in x2 since −b ≈ b2 − 4ac

I if b ≥ 0, cancellation occurs in x1 since b ≈ b2 − 4ac
in both cases b may be exact, but the square root introduces errors.

Dr. Mustafa Ağgül Hacettepe University

You might also like