Final Versiongauss and The Method of Least Squares

Gauss and the Method of Least Squares
Teddy Petrou Hongxiao Zhu
1
Outline
 Who was Gauss?

 Why was there controversy in finding the method of least
squares?
 Gauss’ treatment of error
 Gauss’ derivation of the method of least squares
 Gauss’ derivation by modern matrix notation
 Gauss-Markov theorem
 Limitations of the method of least squares
 References
2
Johann Carl Friedrich Gauss
Born:1777 Brunswick, Germany
Died: February 23, 1855, Göttingen, Germany
By the age of eight during arithmetic class he

astonished his teachers by being able to
instantly find the sum of the first hundred
integers.
3
Facts about Gauss
 Attended Brunswick College in 1792, where he
discovered many important theorems before even
reaching them in his studies
 Found a square root in two different ways to fifty
decimal places by ingenious expansions and
interpolations
 Constructed a regular 17 sided polygon, the first
advance in this matter in two millennia. He was only
18 when he made the discovery
4
Ideas of Gauss
 Gauss was a mathematical scientist with interests in so many

areas as a young man including theory of numbers, to algebra,
analysis, geometry, probability, and the theory of errors.
 His interests grew, including observational astronomy, celestial

mechanics, surveying, geodesy, capillarity, geomagnetism,
electromagnetism, mechanism optics, and actuarial science.
5
Intellectual Personality and Controversy
 Those who knew Gauss best found him to be cold and
uncommunicative.
 He only published half of his ideas and found no one to share his
most valued thoughts.
 In 1805 Adrien-Marie Legendre published a paper on the method of

least squares. His treatment, however, lacked a ‘formal consideration
of probability and it’s relationship to least squares’, making it
impossible to determine the accuracy of the method when applied to
real observations.
 Gauss claimed that he had written colleagues concerning the use of

least squares dating back to 1795
6
Formal Arrival of Least Squares
• Gauss
• Published ‘The theory of the Motion of Heavenly Bodies’ in 1809.
He gave a probabilistic justification of the method,which was
based on the assumption of a normal distribution of errors.
Gauss himself later abandoned the use of normal error function.
• Published ‘Theory of the Combination of Observations Least
Subject to Errors’ in 1820s. He substituted the root mean square
error for Laplace’s mean absolute error.
• Laplace Derived the method of least squares (between1802 and

1820) from the principle that the best estimate should have the
smallest ‘mean error’ -the mean of the absolute value of the error.
7
Treatment of Errors
 Using probability theory to describe error

 Error will be treated as a random variable
 Two types of errors
 Constant-associated with calibration
 Random error
8
Error Assumptions
 Gauss began his study by making two assumptions
 Random errors of measurements of the same type lie within

fixed limits
 All errors within these limits are possible, but not necessarily
with equal likelihood
9
Density Function
We define the function  ( x) with the same meaning as a density function w ith
the following properties .
– The probabilit y of errors lying within th e interval (x, x  dx) is  ( x)dx
– Small errors are more likely to occur than large ones
– Positive and negative errors of the same maginitude are equally likely,  ( x)   ( x)
10
Mean and Variance
 Define k   x ( x)dx . In many cases
assume k=0
 Define mean square error as

m   
2 2
x ( x ) dx

2
 If k=0 then the variance will equal m
11
2
Reasons for m
2
 m is always positive and is simple.
 The function is differentiable and integrable unlike

the absolute value function.
 The function approximates the average value in

cases where large numbers of observations are being
considered,and is simple to use when considering
small numbers of observations.
12
More on Variance
If k  0 then variance equals m  k .
2 2
Suppose we have independent random variables {e, e' , e' ' ,...}
with standard deviation 1 and expected value 0.
The linear function of total errors is given by
E  e   ' e'...
k k
M    e   i2
2 2 2
i i
Now the variance of E is given as i 1 i 1

This is assuming every error falls within standard
deviations from the mean
13
Gauss’ Derivation of the Method of Least Squares
 Suppose a quantity, V=f(x), where V, x are unknown. We

estimate V by an observation L.
 If x is calculated by L, L~f(x), error will occur.
 But if several quantities V,V’,V’’…depend on the same

unknown x and they are determined by inexact
observations, then we can recover x by some combinations
of the observations.
 Similar situations occur when we observe several quantities that

depend on several unknowns.
14
Problem :
We want to estimate V ,V ' , V ' ' ,  by taking independen t observatio ns : L, L' , L' ' , .
where V , V ' , V ' ' , . are  functions of  unkowns x, y, z , .
V  f1 ( x, y, z , ) 
V '  f 2 ( x, y, z ,)


V "  f 3 ( x, y, z , )

 
Let the errors in the observatio ns be :
(V  L) (V ' L' )
v : , v'  ,
p p'
where the p ' s are the weights of the ' mean errors of the observatio ns'.
( Note : We scaled the errors so they have the same variance ) 15
Consider t he following linear system :


v  ax  by  cz    l 

v '  a ' x  b' y  c ' z    l ' 
v' '  a ' ' x  b' ' y  c' ' z    l ' '

 
v, v' , v' ' ,  are written as  linear functions of  unkowns x, y, z ,...
where the coefficien ts a, b, c  are known.
Note :1.This system is ' overdetermined' , since   
2. This system describes a mapping :
F : R   R  , or : parameter space( x, y, z...)  observatio n space(v, v' , v"...)
16
Solve an optimization problem :

min  2   '2  ' '2   
where  ,  ' ,  " ,  are coefficients of v, v ' , v" , 
s.t : v   ' v ' ' ' v' ' etc.  x  k
for some constant k independent of x, y, z, .
We can state the problem as :

We are looking for a linear mapping G(v, v' , v" , ) from R  to R such that :

1. G  F is the identiy on R
2. G statisfies an optimality condition, described as below :
Suppose x  g (v, v ' , v"...) is the first component of G. Then
x  g (v, v ' , v"...)  v   ' v ' " v"...  k .
We want   2 to be as small as possible, and we want similar condition for

the other componet. 17

Solutions:
 2   '2  ' '2  
  2   '2  ' '2 etc.  (   ) 2  ( ' ' ) 2  ( ' ' ' ' ) 2  etc.
where all the  ' s denote the coefficients we derived by elimination
of the system. From which it is obvious that the sum  2   '2  ' '2  
attains its minimun when    ,  '   ' ,  "   " , etc.
 It’s still not obvious:
How do these results relate with the least squares estimation?
18

It can be proved that
(V ( x, y , z ,)  L) 2 (V ' ( x, y, z , )  L' ) 2

Let   v  v' v"   
2 2 2
 
p p'
Least squares picks the parameter values that minimize , where all the partials
  
, , ,  vanish.
x y z
 
i.e.  0,  0, 
x y
we will get the same results as the minimizati on of  2   '2 ...
19
Gauss’ derivation by modern matrix notation:
Assume that observable quantities V1 ,V2 , , V are linear
functions of parameters x1 , x2 , , x such that
Vi  bi1 x1  ...  bi x  ci , bij , ci  R
we know the values of all the bij and ci .
We measure the Vi in an attempt to infer the values of the xi .
Assume Li is an observation of Vi
Switch to a new coordinate system by setting :
vi  (Vi  Li ) / pi
The system becomes :
v  Ax  l
20
Gauss’ derivation by modern matrix notation:
Gauss’ results are equivalent to the following

lemma:
Suppose A is a    (   ) matrix of rank  . Then there is a    matrix K
such that the following holds :
 x  R  , KAx  x
and among all such matrices the matrix E  ( AT A) 1 AT has rows of minimun norm.
Proof : E  ( AT A) 1 AT satisfies the first condition.

The optimizati on condition is that :
2 2 2
Ki  K ii  ...  K i should be as small as possible.
This is equivalent to demanding that the sum of the diagonal entries of KK T
should be as small as possible.
21
Proof continued:
AT A is invertible, denote D  ( AT A) 1 . Thus x, x  ( AT A) 1 AT Ax  DAT Ax

E  ( AT A) 1 AT , Thus E : DAT , and EAx  x ; Also, we have KAx  x;
Subtracting, we get : x, ( K  E ) Ax  0. Thus ( K  E ) A is the zero matrix.
Right multiplying DT and noting that ADT  E T , we get ( K  E ) E T  0
Finally, KK T  ( E  ( K  E ))( E  ( K  E ))T    EE T  ( K  E )( K  E )T
This shows that the solution E is in fact the optimal one, since if ( K  E ) has
any non - zero entries, ( K  E )( K  E )T will have strictly positive entries on its
diagonal.
Returning to our original equation : v  Ax  l , our lemma shows that
G (v) : E ( Ax  l )  El is the left inverse of the function F ( x)  Ax  l (G  F ( x)  x)
and among all linear left inverses, the non - consistent part of G is optimal.
22
Gauss-Markov theorem
In a linear model
x  A  
where A is an n  p matrix with rank p,  is an unknown vector, and  is the
error vector. If E( )  0 and Var(  )   2 I , then for any unbiased estimator
~ ~
 of   C T , we have E(ˆLS )   and Var(C TˆLS )  Var (  )
In other words, when  ' s have the same variance and are uncorrelated, the least -
squares estimator is the best unbiased estimator with the smallest variance.
23
Limitation of the Method of Least Squares
 Nothing is perfect:
 This method is very sensitive to the presence of

unusual data points. One or two outliers can
sometimes seriously skew the results of a least
squares analysis.
24
References
 Gauss, Carl Friedrich, Translated by G. W. Stewart. 1995. Theory of the
Combination of Observations Least Subject to Errors: Part One, Part
Two, Supplement. Philadelphia: Society for Industrial and Applied
Mathematics.
 Plackett, R. L. 1949. A Historical Note on the Method of Least Squares.
Biometrika. 36:458–460.
 Stephen M. Stiger, Gauss and the Invention of Least Squares. The
Annals of Statistics, Vol.9, No.3(May,1981),465-474.
 Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
 Belinda B.Brand, Guass’ Method of Least Squares: A historically-based
introduction. August 2003
 http://www.infoplease.com/ce6/people/A0820346.html
 http://www.stetson.edu/~efriedma/periodictable/html/Ga.html
25

Final Versiongauss and The Method of Least Squares

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Versiongauss and The Method of Least Squares

Uploaded by

Copyright:

Available Formats

Gauss and the Method of Least Squares

Teddy Petrou Hongxiao Zhu

 Who was Gauss?

Born:1777 Brunswick, Germany

Died: February 23, 1855, Göttingen, Germany

By the age of eight during arithmetic class he

 Gauss was a mathematical scientist with interests in so many

 His interests grew, including observational astronomy, celestial

 In 1805 Adrien-Marie Legendre published a paper on the method of

 Gauss claimed that he had written colleagues concerning the use of

• Laplace Derived the method of least squares (between1802 and

 Using probability theory to describe error

 Gauss began his study by making two assumptions

 Random errors of measurements of the same type lie within

– The probabilit y of errors lying within th e interval (x, x  dx) is  ( x)dx

– Small errors are more likely to occur than large ones

 The function is differentiable and integrable unlike

 The function approximates the average value in

 Suppose a quantity, V=f(x), where V, x are unknown. We

 If x is calculated by L, L~f(x), error will occur.

 But if several quantities V,V’,V’’…depend on the same

 Similar situations occur when we observe several quantities that

Consider t he following linear system :

We want   2 to be as small as possible, and we want similar condition for

(V ( x, y , z ,)  L) 2 (V ' ( x, y, z , )  L' ) 2

Gauss’ results are equivalent to the following

Proof : E  ( AT A) 1 AT satisfies the first condition.

AT A is invertible, denote D  ( AT A) 1 . Thus x, x  ( AT A) 1 AT Ax  DAT Ax

 This method is very sensitive to the presence of

You might also like