Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 25

Gauss and the Method of Least Squares

Teddy Petrou Hongxiao Zhu

1
Outline

 Who was Gauss?


 Why was there controversy in finding the method of least
squares?
 Gauss’ treatment of error
 Gauss’ derivation of the method of least squares
 Gauss’ derivation by modern matrix notation
 Gauss-Markov theorem
 Limitations of the method of least squares
 References

2
Johann Carl Friedrich Gauss

Born:1777 Brunswick, Germany

Died: February 23, 1855, Göttingen, Germany

By the age of eight during arithmetic class he


astonished his teachers by being able to
instantly find the sum of the first hundred
integers.

3
Facts about Gauss
 Attended Brunswick College in 1792, where he
discovered many important theorems before even
reaching them in his studies
 Found a square root in two different ways to fifty
decimal places by ingenious expansions and
interpolations
 Constructed a regular 17 sided polygon, the first
advance in this matter in two millennia. He was only
18 when he made the discovery

4
Ideas of Gauss

 Gauss was a mathematical scientist with interests in so many


areas as a young man including theory of numbers, to algebra,
analysis, geometry, probability, and the theory of errors.

 His interests grew, including observational astronomy, celestial


mechanics, surveying, geodesy, capillarity, geomagnetism,
electromagnetism, mechanism optics, and actuarial science.

5
Intellectual Personality and Controversy
 Those who knew Gauss best found him to be cold and
uncommunicative.

 He only published half of his ideas and found no one to share his
most valued thoughts.

 In 1805 Adrien-Marie Legendre published a paper on the method of


least squares. His treatment, however, lacked a ‘formal consideration
of probability and it’s relationship to least squares’, making it
impossible to determine the accuracy of the method when applied to
real observations.

 Gauss claimed that he had written colleagues concerning the use of


least squares dating back to 1795

6
Formal Arrival of Least Squares
• Gauss
• Published ‘The theory of the Motion of Heavenly Bodies’ in 1809.
He gave a probabilistic justification of the method,which was
based on the assumption of a normal distribution of errors.
Gauss himself later abandoned the use of normal error function.
• Published ‘Theory of the Combination of Observations Least
Subject to Errors’ in 1820s. He substituted the root mean square
error for Laplace’s mean absolute error.

• Laplace Derived the method of least squares (between1802 and


1820) from the principle that the best estimate should have the
smallest ‘mean error’ -the mean of the absolute value of the error.

7
Treatment of Errors

 Using probability theory to describe error


 Error will be treated as a random variable
 Two types of errors
 Constant-associated with calibration

 Random error

8
Error Assumptions

 Gauss began his study by making two assumptions

 Random errors of measurements of the same type lie within


fixed limits

 All errors within these limits are possible, but not necessarily
with equal likelihood

9
Density Function
We define the function  ( x) with the same meaning as a density function w ith
the following properties .

– The probabilit y of errors lying within th e interval (x, x  dx) is  ( x)dx

– Small errors are more likely to occur than large ones

– Positive and negative errors of the same maginitude are equally likely,  ( x)   ( x)

10
Mean and Variance
 Define k   x ( x)dx . In many cases
assume k=0
 Define mean square error as

m   
2 2
x ( x ) dx


2
 If k=0 then the variance will equal m
11
2
Reasons for m
2
 m is always positive and is simple.

 The function is differentiable and integrable unlike


the absolute value function.

 The function approximates the average value in


cases where large numbers of observations are being
considered,and is simple to use when considering
small numbers of observations.

12
More on Variance
If k  0 then variance equals m  k .
2 2

Suppose we have independent random variables {e, e' , e' ' ,...}
with standard deviation 1 and expected value 0.
The linear function of total errors is given by
E  e   ' e'...
k k
M    e   i2
2 2 2
i i
Now the variance of E is given as i 1 i 1


This is assuming every error falls within standard
deviations from the mean

13
Gauss’ Derivation of the Method of Least Squares

 Suppose a quantity, V=f(x), where V, x are unknown. We


estimate V by an observation L.

 If x is calculated by L, L~f(x), error will occur.

 But if several quantities V,V’,V’’…depend on the same


unknown x and they are determined by inexact
observations, then we can recover x by some combinations
of the observations.

 Similar situations occur when we observe several quantities that


depend on several unknowns.

14
Gauss’ Derivation of the Method of Least Squares

Problem :
We want to estimate V ,V ' , V ' ' ,  by taking independen t observatio ns : L, L' , L' ' , .
where V , V ' , V ' ' , . are  functions of  unkowns x, y, z , .
V  f1 ( x, y, z , ) 
V '  f 2 ( x, y, z ,)


V "  f 3 ( x, y, z , )

 
Let the errors in the observatio ns be :
(V  L) (V ' L' )
v : , v'  ,
p p'
where the p ' s are the weights of the ' mean errors of the observatio ns'.
( Note : We scaled the errors so they have the same variance ) 15
Gauss’ Derivation of the Method of Least Squares

Consider t he following linear system :



v  ax  by  cz    l 

v '  a ' x  b' y  c ' z    l ' 
v' '  a ' ' x  b' ' y  c' ' z    l ' '

 
v, v' , v' ' ,  are written as  linear functions of  unkowns x, y, z ,...
where the coefficien ts a, b, c  are known.
Note :1.This system is ' overdetermined' , since   
2. This system describes a mapping :
F : R   R  , or : parameter space( x, y, z...)  observatio n space(v, v' , v"...)

16
Solve an optimization problem :


min  2   '2  ' '2   
where  ,  ' ,  " ,  are coefficients of v, v ' , v" , 
s.t : v   ' v ' ' ' v' ' etc.  x  k
for some constant k independent of x, y, z, .
We can state the problem as :

We are looking for a linear mapping G(v, v' , v" , ) from R  to R such that :

1. G  F is the identiy on R
2. G statisfies an optimality condition, described as below :
Suppose x  g (v, v ' , v"...) is the first component of G. Then
x  g (v, v ' , v"...)  v   ' v ' " v"...  k .

We want   2 to be as small as possible, and we want similar condition for


the other componet. 17
Gauss’ Derivation of the Method of Least Squares

Solutions:
 2   '2  ' '2  
  2   '2  ' '2 etc.  (   ) 2  ( ' ' ) 2  ( ' ' ' ' ) 2  etc.
where all the  ' s denote the coefficients we derived by elimination
of the system. From which it is obvious that the sum  2   '2  ' '2  
attains its minimun when    ,  '   ' ,  "   " , etc.
 It’s still not obvious:
How do these results relate with the least squares estimation?

18
Gauss’ Derivation of the Method of Least Squares


It can be proved that

(V ( x, y , z ,)  L) 2 (V ' ( x, y, z , )  L' ) 2


Let   v  v' v"   
2 2 2
 
p p'
Least squares picks the parameter values that minimize , where all the partials
  
, , ,  vanish.
x y z
 
i.e.  0,  0, 
x y
we will get the same results as the minimizati on of  2   '2 ...

19
Gauss’ derivation by modern matrix notation:
Assume that observable quantities V1 ,V2 , , V are linear
functions of parameters x1 , x2 , , x such that
Vi  bi1 x1  ...  bi x  ci , bij , ci  R
we know the values of all the bij and ci .
We measure the Vi in an attempt to infer the values of the xi .
Assume Li is an observation of Vi
Switch to a new coordinate system by setting :
vi  (Vi  Li ) / pi
The system becomes :
v  Ax  l

20
Gauss’ derivation by modern matrix notation:

Gauss’ results are equivalent to the following


lemma:
Suppose A is a    (   ) matrix of rank  . Then there is a    matrix K
such that the following holds :
 x  R  , KAx  x
and among all such matrices the matrix E  ( AT A) 1 AT has rows of minimun norm.

Proof : E  ( AT A) 1 AT satisfies the first condition.


The optimizati on condition is that :
2 2 2
Ki  K ii  ...  K i should be as small as possible.
This is equivalent to demanding that the sum of the diagonal entries of KK T
should be as small as possible.
21
Proof continued:

AT A is invertible, denote D  ( AT A) 1 . Thus x, x  ( AT A) 1 AT Ax  DAT Ax


E  ( AT A) 1 AT , Thus E : DAT , and EAx  x ; Also, we have KAx  x;
Subtracting, we get : x, ( K  E ) Ax  0. Thus ( K  E ) A is the zero matrix.
Right multiplying DT and noting that ADT  E T , we get ( K  E ) E T  0
Finally, KK T  ( E  ( K  E ))( E  ( K  E ))T    EE T  ( K  E )( K  E )T
This shows that the solution E is in fact the optimal one, since if ( K  E ) has
any non - zero entries, ( K  E )( K  E )T will have strictly positive entries on its
diagonal.
Returning to our original equation : v  Ax  l , our lemma shows that
G (v) : E ( Ax  l )  El is the left inverse of the function F ( x)  Ax  l (G  F ( x)  x)
and among all linear left inverses, the non - consistent part of G is optimal.
22
Gauss-Markov theorem

In a linear model
x  A  
where A is an n  p matrix with rank p,  is an unknown vector, and  is the
error vector. If E( )  0 and Var(  )   2 I , then for any unbiased estimator
~ ~
 of   C T , we have E(ˆLS )   and Var(C TˆLS )  Var (  )

In other words, when  ' s have the same variance and are uncorrelated, the least -
squares estimator is the best unbiased estimator with the smallest variance.

23
Limitation of the Method of Least Squares

 Nothing is perfect:

 This method is very sensitive to the presence of


unusual data points. One or two outliers can
sometimes seriously skew the results of a least
squares analysis.

24
References
 Gauss, Carl Friedrich, Translated by G. W. Stewart. 1995. Theory of the
Combination of Observations Least Subject to Errors: Part One, Part
Two, Supplement. Philadelphia: Society for Industrial and Applied
Mathematics.
 Plackett, R. L. 1949. A Historical Note on the Method of Least Squares.
Biometrika. 36:458–460.
 Stephen M. Stiger, Gauss and the Invention of Least Squares. The
Annals of Statistics, Vol.9, No.3(May,1981),465-474.
 Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
 Belinda B.Brand, Guass’ Method of Least Squares: A historically-based
introduction. August 2003
 http://www.infoplease.com/ce6/people/A0820346.html
 http://www.stetson.edu/~efriedma/periodictable/html/Ga.html

25

You might also like