Professional Documents
Culture Documents
A Rapidly Convergent Iterative Method For The Solution of The Genaralised Nonliner Problem
A Rapidly Convergent Iterative Method For The Solution of The Genaralised Nonliner Problem
A Rapidly Convergent Iterative Method For The Solution of The Genaralised Nonliner Problem
A new iterative method is described for the solution of the generalised nonlinear least squares
problem: where the model may be nonlinear in its parameters and in the independent variable(s)
and all variables are subject to error. The method is described for the case of two arbitrarily
related variables; does not require the analytic calculation of derivatives; leads to exceptionally
close satisfaction of the least squares conditions; and exhibits especially rapid convergence arising
from the use of somewhat unconventional numerical approximations for partial derivatives.
Examples are given which compare the results of the method of those of other existing techniques.
(Received October 1970, Revised October 1971)
It is frequently the object of an experiment to find, prove, and/ and extensively applied (Wolberg, 1967; Macdonald, 1969).
or document some causal relation between experimental Southwell (1969) has recently stated, "There has thus far been
variables. Such a relationship may be called a model for the no satisfactory solution to the general case of fitting data with
phenomenon investigated. Since a model generally establishes errors in both co-ordinates to nonlinear functions.' He then
only a small and finite number of connections between several goes on to present what he believes to be such a solution. By
variables and a limited number of parameters, it is always an 'nonlinear functions' Southwell means functions which are not
idealisation of even an isolated part of nature. Once a model has linear in one or more of the aA's.
been selected as 'best' by some (preferably) objective criterion Now Deming's solution does apply to functions nonlinear in
from the finite set of ones thought possible for the situation the parameters involved and has been used for them
. involved, the problem arises of finding the most appropriate (Wentworth, 1965; Wolberg, 1967; Macdonald, 1969).
values, or estimates, for the parameters involved in the model. Unfortunately, Southwell gives no comparison with Deming's
Both model selection and parameter determination are often work on either a theoretical or computational basis. Further,
carried but using the method of least squares. Southwell claims to find exact expressions for the variances of
In ordinary least squares analysis it is customary to consider the parameter value estimates, ak = ak. But Draper and Smith
one 'dependent' variable y and / 'independent' variables, xJt (1966), among others, have pointed out that least squares
where j = 1,2,...,/. The model is then generally written as solutions applied to models involving parameters nonlinearly
y = / ( x , a); the components of the vector a are the u.k para- (hereinafter to be termed 'nonlinear models') lead to bias in the
meters; and k = 1, 2, . . ., m. Conventionally, dependent and usual statistical estimates (see also Macdonald, 1969; Mac-
independent variables are more often distinguished not strictly donald and Powell, 1971). Finally, perhaps a minor point,
by dependence and independence but by the assumption of no, but one beloved of the statisticians: Southwell generally ignores
or negligible, errors in experimental values of the Xj, denoted the distinction between estimated and exact values and claims
Xh and the expected presence of such errors in experimental to give (nonasymptotic) formulae for exact parameters as well
values of y, denoted F,-. Here i = 1, 2, . . ., n, and n is the as exact variances. Evidently, Southwell has not tried out his
number of (Yt, XJt) experimentally determined (/ + l)-tuple iterative solution on many nonlinear models (he presents
values. Both because it is still the most common case and for quantitative results only for straight linefitting);had he done so,
simplicity, the rest of this paper will deal only with two vari- he would have found that even with exceptionally good
ables, y and x, thus taking / = 1. Our results may be readily initial estimates of the parameter values his procedure would
generalised for arbitrary /, however, by well-known methods not have converged to the least squares solution and often
(Deming, 1943; Wolberg, 1967). would not have converged at all.
Although there are experimental situations where errors in X The situation appears to us to be as follows. Notwithstanding
are negligible compared to those in Y, only when a variable is Southwell's work and claims, up to the present there seems to
intrinsically discrete (because of quantum-mechanical or whole- be available no least squares procedure which will converge to
number reasons) is there a possibility of obtaining exact values a true least squares solution when applied to a general non-
of it from experiment. Further, there are many measurement linear model involving weighting of both Y and X. Further,
areas where not only are unavoidable random errors present in there has been no complete assessment made yet of the ade-
both Y and X, but those in X are too large to neglect in the quacy of Deming's solution for such a problem. In the present
usual way. Ordinary least squares, even with weighting of the Y work, we present a solution to the above general problem which
observations, is then inadequate and can lead to bias in esti- exhibits exceptional convergence characteristics. While we do
mated parameter and variance values (Macdonald, 1969; not claim invariable convergence for any conceivable model, we
Macdonald and Powell, 1971). do claim, and demonstrate, a least squares solution when con-
Clearly then, a least squares procedure which takes proper vergence is achieved. Although the problem of bias in estimates
account of errors in both variables is frequently needed. Note obtained from arbitrary nonlinear least squares solutions is
that such a procedure will be useful whether or not one variable, difficult and has not been totally resolved, Box (1971) has
say X, is 'controlled' (by being set as closely as possible to a recently made an important contribution to the area.
given value) or not (Macdonald, 1969). Deming (1943) long Finally, it is important to emphasise that all models (except
ago gave an approximate solution of the problem which has y = a, + a2x) are nonlinear in their parameters when the X
been too little known and used. It has, however, been employed values are not taken exact but are weighted. Even though the
by Wentworth (1965) and, more recently, has been described model itself is formally linear in all its parameters, the x-
Iteration dS dS
S
Number °l °2 da2
Iteration dS dS_
Number da i da-,
derivative values of S given in the table that the least squares dent variable (the case Southwell considered).
minimisation conditions are poorly satisfied by the Deming In order to compare our method directly with that of O'Neill
results. Note the ~ 4 0 % decrease in sak in going from Deming's et ah, we programmed their algorithm (with the omission of
parameter standard deviation estimates to our present estimates. their orthogonal polynomial expansion) for the CDC 6400. For
Table 4 indicates the number of 'iterations' used by various the present straight-line model, the O'Neill algorithm run on
procedures to achieve the least squares solution. We place the the 6400 gave the same parameter estimates and sum of squares
word 'iteration' in quotes as a caution in the examination of as those quoted by O'Neill et ah (for the ICT 1905 computer)
Table 4, since each separate method uses a different iteration at 125 iterations and also at 148, where convergence was
scheme. The table gives only a subjective comparison of con- reached. Thus, it appears likely that the orthogonal poly-
vergence speeds. Southwell's iterative method does not con- nomial modification actually used by O'Neill et ah, makes no
verge at all in this case because his formula for 8x was derived difference in the number of iterations required for convergence,
without the use of the necessary second partials. He does obtain and it seems clear that there were no important errors in either
the correct solution in his own work, however, when he the original numerical work of O'Neill et ah or in our realisa-
tion of their method for the CDC 6400. Using Aitken's
eliminates x f from (1) by solving ^— = 0 exactly. This can be accelerative process, O'Neill et ah get convergence with 13
OXi iterations. In our tests of the O'Neill et ah method, we found
done only when one has a model which is linear in the indepen- that the use of the Epsilon Algorithm (Macdonald, 1964) after
every five steps produced convergence with only 10 iterations.
Our present method does not need accelerative techniques since
Table 4 'Iterations' required to achieve minimum sum of convergence typically occurs very rapidly.
squares, S, for Pearson's data with York's weights Table 4 shows not only that the present general algorithm
O = ax + a2 x) exhibits exceptionally rapid convergence but also that when it
is used with exact analytical derivatives rather than with the
METHOD NO. 'ITERATIONS' special approximations of equations (17), it requires exactly
the same large number of iterations as does the O'Neill et ah
Southwell Did not converge method. Further, even when conventional approximate
O'Neill 148 derivatives (without the x incrementation implicit in our use of
O'Neill (Aitken) 13 equations (17)) are used in the general method, one finds that,
General Method with Exact Derivatives for a judicious choice of Ax = Aa, again 148 iterations are
(Single Precision) 148 required for convergence. These results make it clear that
General Method (Single Precision) 4 indeed the gain in convergence speed in the general method
General Method (Double Precision) 3 arises completely from our unconventional derivative approxi-
Minimum S = 11 •866353 mations. A full theoretical analysis and justification of
a, = 5 •4799 our approach has recently been carried out (Jones), which
completely supports our present claims. As already stated, the
a2 = - 0 •48053
exploration of (x, a) space in the present method compared to
Deming sai = 0-361 5O2 = 0-404 sa3 = 0-128 sa4 = 0-0113 sd = 0-2848
General Method sai = 0-265 sa2 = 0-298 j O 3 = 00918 5O4 = 00080 J d = 0-2844
Table 6 Results of general method using Pearson's data with unity weights on both X and y
a2x + a3jc2 + 0.4.x -1- tx5x •+- a 6 x ;
Iteration
Number ax 5 S
0 5-924 -0-7407 0-02688 -3-324 x 10" 3 2-692 X 10" 3 —;5-208 X 10" 4 0-45300
1 5-911 -0-5813 -01014 003377 -1-925 X 10" 3 -1102 X 10" 4 0-45035
2 5-916 -0-6168 -006605 002114 -5-272 X 10" 5 -2-083 X 10" 4 0-45034
3 5-916 -0-6116 -0-07134 002302 -3-298 X 10" 4 — •939 X 10" 4 0-45033
4 5-915 -0-6019 -008157 002677 -8-950 X 10" 4 — •640 X 10" 4 0-45033
5 5-915 -0-6038 -0-07962 0-02607 -7-899 X 10" 4 — •695 X 10" 4 0-45033
6 5-915 -0-6034 -008011 002624 -8-163 X 10" 4 — •681 X 10" 4 0-45033
7 5-915 -0-6035 -008001 002621 -8-110 X io- 44 — 1-684 X 10" 4 0-45033
8 5-915 -0-6034 -008003 002622 -8-121 X 10" •683 X 10" 4 0-45033
9 5-915 -0-6034 -008002 002621 -8118 X 10" 4 1-683 X 10" 4 0-45033
10 5-915 -0-6034 -008003 002622 -8119 X 10" 4 - •683 X 10" 4 0-45033
Estimates of standard deviations:
Book review
Numerical Initial Value Problems in Ordinary Differential Equations, advantages are economy of arithmetic, especially when changing
by C. William Gear, 1971; 253 pages. {Prentice-Hall, £6-50) step length, and that the method may depend on a smaller number of
This text has 12 chapters on: 1, Introduction (and Euler Method); previous steps, since the k stored values may approximate yT and
2, Higher order one-step methods; 3, Systems of equations and Ayr, tr) for 0 =S r =S {{k - l)/2). Methods investigated include
equations of order greater than one; 4, Convergence, error bounds, those using a fixed number of corrector iterations, as well as iter-
and error estimates for one-step methods; 5, The choice of step size ation to convergence. The only particular multistep or multivalue
and order; 6, Extrapolation methods (of Bulirsch and Stoer, etc.); methods discussed are those of Milne and Adams-Bashforth-
7, Multivalue or Multistep methods—introduction; 8, General Moulton.
multistep methods, order and stability; 9, Multivalue methods; 10, Chapter 10 includes a host of theorems relating the root condition,
Existence, convergence, and error estimates for multivalue methods; stability, consistency, convergence and asymptotic error form. The
11, Special methods for special problems (mainly stiff equations); Dahlquist theory on the maximal order of stable multistep methods is
12, Choosing a method. also given. I did not find this theory easy to understand, partly
The treatment is in general good especially in the more practical because the author treats systems of pth order equations involving
Chapters 5, 6, 11 and 12. Complete FORTRAN routines are (p — q) other derivatives, necessitating norms involving these two
described and given for fourth order Runge Kutta with automatic suffices, and partly because he does not always make clear exactly
step-length control, and for a variable order, variable step-length how the steps in the proof follow from the given hypotheses.
multivalue method with optional provision for stiff equations. A Rather than the usual one used by Gear, I prefer the (to me) more
FORTRAN version of Bulirsch and Stoer's ALGOL procedure is obvious definition of consistency of order r, for a/>th order equation:
also given. These large programs are photographically reproduced EaiZ(tn-l)lhP- E 0, f(z(tn-t) tn-t)
from computer printout. Smaller programs are type-set and much = Z<P> (tn) ~ f(zUn), tn) + O{h').
easier to read.
The requirement that E j8( = 1 and the index of the order h term are
The discussion of one-step methods is shorter than that of Henrici then automatic.
(1962), but all the essential theoretical points are covered. Very many stability concepts—asymptotic stability, absolute and
Multistep methods are treated from the less familiar mu\X\value relative stability regions, stiff A stability, and several others—are
point of view (Gear, 1967), although this is scarcely used in Chapters introduced and clearly explained. Figures 11.2 and 11.3 illustrate the
6 and 7. These multiva/t/e methods are exemplified by the finite difficulty of solving stiff equations, and the advantage of the back-
difference and Nordsieck forms which are equivalent to the Adams- ward Euler method very well.
Bashforth-Moulton method. The stored quantities at each step are
linear combinations of the usual yr and f(yT, t,) but the same approxi- I noticed rather more than the usual number of misprints, and in
mating polynomial is used for all equivalent methods. Possible Continued on page 169
Volume 15 Number 2 155
3