Gauss Newton and Full Newton Methods in Frequency Space Seismic Waveform Inversion

Geophys. J. Int.
(1998) 133, 341^362
Gauss^Newton and full Newton methods in frequency^space

seismic waveform inversion
R. Gerhard Pratt,1, * Changsoo Shin2 and G. J. Hicks3

1
Department of Geological Sciences, Queen's University in Kingston, Ontario, K7L 3N6, Canada
2
School of Civil, Urban and Geosystem Engineering, College of Engineering, Seoul National University, Seoul 151-742, South Korea
3
Department of Geology, Imperial College, London SW7 2BP, UK
Accepted 1997 October 21. Received 1997 August 7; in original form 1997 January 21
S U M M A RY
By specifying a discrete matrix formulation for the frequency^space modelling
problem for linear partial di¡erential equations (`FDM' methods), it is possible to
derive a matrix formalism for standard iterative non-linear inverse methods, such as the
gradient (steepest descent) method, the Gauss^Newton method and the full Newton
method. We obtain expressions for each of these methods directly from the discrete
FDM method, and we refer to this approach as frequency-domain inversion (FDI). The
FDI methods are based on simple notions of matrix algebra, but are nevertheless very
general. The FDI methods only require that the original partial di¡erential equations
can be expressed as a discrete boundary-value problem (that is as a matrix problem).
Simple algebraic manipulation of the FDI expressions allows us to compute the
gradient of the mis¢t function using only three forward modelling steps (one to compute
the residuals, one to backpropagate the residuals, and a ¢nal computation to compute
a step length). This result is exactly analogous to earlier backpropagation methods
derived using methods of functional analysis for continuous problems. Following
from the simplicity of this result, we give FDI expressions for the approximate
Hessian matrix used in the Gauss^Newton method, and the full Hessian matrix used
in the full Newton method. In a new development, we show that the additional term
in the exact Hessian, ignored in the Gauss^Newton method, can be e¤ciently computed
using a backpropagation approach similar to that used to compute the gradient vector.
The additional term in the Hessian predicts the degradation of linearized inversions
due to the presence of ¢rst-order multiples (such as free-surface multiples in seismic
data). Another interpretation is that this term predicts changes in the gradient vector
due to second-order non-linear e¡ects. In a numerical test, the Gauss^Newton and full
Newton methods prove e¡ective in helping to solve the di¤cult non-linear problem of
extracting a smooth background velocity model from surface seismic-re£ection data.
Key words: di¡raction, ¢nite-di¡erence methods, inversion, numerical techniques,
seismic velocities, wave equation.
numerical di¡erence scheme via a ¢nite-di¡erence forward-

I NT ROD UC T I O N
modelling technique. At that time, the research was con-
During the last decade, seismic waveform inversion has been sidered too expensive to pursue. An important step was
tackled by applied mathematicians and geophysicists with taken by Lailly (1983) and Tarantola (1984), who recognized
an increasing degree of success. To our knowledge, the ¢rst that the steepest descent direction for the inverse problem
attempt at waveform inversion was attempted by Lines & Kelly (the negative gradient) could be computed without com-
in 1983 (L. Lines, personal communication, 1992). Lines & puting the partial derivatives explicitly. They showed (for
Kelly computed partial derivatives of the seismogram with the acoustic wave equation) how the gradient of the mis¢t
respect to the coordinates of a wedge-shaped model, using a function could be computed by `backpropagating' the data
residuals and correlating the result with forward-propagated
* Formerly at: Department of Geology, Imperial College, London wave¢elds, in a manner very similar to many pre-stack
SW7 2BP, UK. migration algorithms. Numerical results using this approach
ß 1998 RAS 341
GJI000 21/4/98 09:10:55 3B2 version 5.20

The Charlesworth Group, Huddersfield 01484 517077
342 R. G. Pratt, C. Shin and G. J. Hicks
were given by Kolb, Collino & Lailly (1986) and Gauthier, partial derivative equations, including the simplest 1-D scalar
Virieux & Tarantola (1986). Mora (1987a) extended the tech- wave equation, the full 3-D viscoelastic anisotropic wave
nique to elastic problems and provided synthetic numerical equation, or even electromagnetic and potential ¢eld equations
examples using complex geological structures. These authors can (in principle) be treated with the same formalism.
all formulated their methods in the time domain; Pratt & The FDM/FDI formalism leads to a matrix algebra
Worthington (1990) and Pratt (1990) applied the same idea to demonstration of Lailly's (1983) fast method for computing the
inverse problems in the frequency domain and used an implicit gradient of the mis¢t function for the waveform inversion
frequency-domain ¢nite-di¡erence technique to provide the problem. Leading on from this, we show how the approximate
forward model. Hessian matrix is structured and used in Gauss^Newton
Several researchers have applied these methods to ray inversion, and we provide a new, fast method for computing
theoretical solutions of the wave equation (Beydoun et al. 1989; the Hessian matrix required for the full Newton inversion
Lambare et al. 1992). The local nature of the ray approxi- technique. In this paper we present two numerical examples.
mation makes this approach computationally attractive, and The ¢rst is a 2-D imaging example. The gradient method for
ray-based techniques have enjoyed considerable application to 2-D seismic imaging and inversion has been demonstrated by
real re£ection data (Beydoun et al. 1989; Beydoun et al. 1990). many authors; here we too produce such a demonstration, but
Some progress has also been made outside the ray paradigm in we extend the demonstration to illustrate the application of the
terms of application to re£ection data (Crase et al. 1990). Some Gauss^Newton method to the same problem. In our second,
success has been achieved in tomographic (transmission) more signi¢cant example we illustrate the utility of the full
imaging from cross-borehole data by utilizing gradient Newton method (using the exact Hessian matrix), in helping
methods for waveform inversion in conjunction with ¢nite- to solve a classic, but di¤cult, problem in seismic waveform
di¡erence modelling. Results with real cross-borehole data inversion, the problem of determining the low-wavenumber
have been demonstrated by Zhou et al. (1995), who utilized a (or `background') seismic velocity variation from surface
time-domain approach, and Song, Williamson & Pratt (1995) re£ection data.
and Pratt et al. (1995) who utilized a frequency-domain Our paper is organized into three sections. In the ¢rst section
approach. we present the FDM method, the discrete forward-modelling
For large problems such as those summarized in the previous method on which all our results are based. In the second
paragraph, gradient methods seem to be the method of choice, section, we recast the classical techniques of iterative inversion
as they do not require the inversion of large matrices. Although using the FDI formalism, we illustrate these methods with a
gradient methods, such as the conjugate gradient method, can numerical, 2-D inversion example, and we show how the
be formulated that have quadratic convergence, it can take a additional (non-linear) term in the full Hessian can be
signi¢cant amount of iterations before quadratic convergence e¤ciently computed by backpropagation techniques. In the
is established. Shin (1988) applied a Gauss^Newton method to third and ¢nal section of this paper, we present our second,
invert seismic data using a frequency-domain ¢nite-element more detailed numerical example illustrating the application of
method. The possibility of using the full Newton method for the full Newton method to re£ection seismology.
small problems has been investigated by Santosa (1987). When Our paper contains two appendices. Our ¢rst appendix
the number of model parameters is large, Newton algorithms gives explicit formulae for the results of this paper when the
involve the computation and inversion of large matrices. This model is parametrized using alternative linear basis functions,
can be circumvented, particularly during the early stages of and compares these approaches to the subspace search method
iterative inversions, by restricting the number of model para- of Kennett, Sambridge & Williamson 1988) and the multiscale
meters, in which case the matrices are considerably smaller. As approach advocated by Bunks et al. (1995). Our second
forward-modelling techniques are re¢ned, it is now feasible to appendix compares the use of the additional term in the
consider the more rapidly converging Gauss^Newton method Hessian to a similar (but not identical) non-linear approach
and the full Newton method using workstation computational developed by Snieder (1990).
facilities.
In this paper we present a new formalism for posing the
FORWA R D MOD E L LI NG I N T H E
seismic waveform inversion problem, based on a discrete
S PAC E ^ F R E QU E NC Y D OM A I N
frequency^space forward-modelling procedure that we term
frequency-domain modelling (FDM). FDM methods can be The discretized equations for the acoustic or elastic wave
posed as either ¢nite-di¡erence problems or ¢nite-element equations using either a ¢nite-di¡erence or a ¢nite-element
problems. It was shown by Marfurt (1984) that the multiple- approach can be written as
source numerical modelling problem was best approached
using FDM methods; further developments in FDM methods Mu¬« (t)zKu¬ (t)~ fì (t) (1)
(Jo, Shin & Suh 1996; Stekl & Pratt 1997) have recently made (see for example Marfurt 1984), where u¬ (t) is the discretized
this approach still more attractive. By specifying the FDM wave¢eld (that is the pressure or the displacement) arranged as
method from the outset, we are able to introduce a discrete a column vector, M is the mass matrix, K is the sti¡ness matrix
matrix formalism for the seismic-waveform inverse problem and fì (t) are the source terms, also arranged as a column vector.
directly in the frequency^space domain. We term this approach Eq. (1) includes the boundary conditions implicitlyöthe
frequency-domain inversion (FDI). The combined FDM/FDI actual form of the boundary conditions will alter the various
approach allows us to replace the notions of functional analysis matrix coe¤cients. If viscous damping is included, eq. (1)
of, for example, Tarantola (1987), with the simpler notions of becomes
matrix algebra. This does not depend on the utilization of a
speci¢c wave equation, or on a speci¢c parametrization. Many Mu¬« (t)zCu¬ç (t)zKu¬ (t)~ fì (t) , (2)
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:11:33 3B2 version 5.20

Frequency^space seismic waveform inversion 343
where C is the damping matrix. Details of the ¢nite-element

and ¢nite-di¡erence approaches can be found in many text-
books (Bathe & Wilson 1976; Zienkiewicz & Taylor 1989).
The mass, sti¡ness and damping matrices are computed by
forming a discrete representation of the underlying (spatial)
partial di¡erential equations and the physical parameters
(for example, the seismic velocities, the bulk density and the
attenuation parameters). [Our description is in terms of wave
propagation modelling, but the modelling method, and the
following inverse methods, can be applied to any physical
system that can be represented using eq. (2).] It is convenient
(but not required) that source^receiver reciprocity holds. The Figure 1. A discrete representation of the forward-modelling
reciprocity property depends on the underlying physics of the problem. The representation is schematic; the assumption of two
problem as well as on the details of the numerical approach. In dimensions is not required, nor is this ordering of the node points
particular, reciprocity depends on the boundary conditions of necessary. The wave¢eld (either a scalar or a vector quantity) is
the problem. sampled at each of the nx |nz ~l node points. Receiver data is
Eqs (1) or (2) are expressed in the time^space domain. We synthesized at the ¢rst n of these node points.
now choose to implement a frequency-domain solution. This
allows distinct computational advantages for multisource
problems (Marfurt 1984; Pratt 1990). As we shall show, this could be 1-, 2- or 3-D). The model can be thought of as
leads to a straightforward formalism for the inverse methods being speci¢ed at each of these node points, but such a para-
employed by Tarantola (1984) and others. Taking the temporal metrization is not necessary, nor even particularly desirable.
Fourier transform of eq. (2) yields Since ¢nite-di¡erence or ¢nite-element methods generally use a
discretization interval that is far smaller than the resolution of
Ku(u)ziuCu(u){u2 Mu(u)~f(u) , (3)
most realistic earth models, it is often sensible to de¢ne the
where model in terms of an alternative parameter set, p, an m|1
? ? column vector, where m=l in general (and usually m¦l ). In
u(u)~ u¬ (t) e{iut dt and f(u)~ fì (t) e{iut dt : (4) terms of generating forward-modelled data, it is only necessary
{? {?
to be able to compute the l|l impedance matrix S(p). As we
For simplicity we rewrite eq. (3) as shall see later, when considering the inverse problem we will
also require expressions for LS/Lpi for all m model parameters.
S u~f or u~S{1 f , (5)
(In Appendix A we discuss parametrization in more general
where the complex ìmpedance' matrix, S, is given by terms.) We also assume that all model parameters are real
S~K{u2 MziuC. Frequency-domain modelling is an valued. Certain physical parameters can be represented as
implicit ¢nite-di¡erence method (Marfurt 1984); the second, complex numbers (for example velocity and attenuation can be
explicit, form shown in eq. (5) is only representational, as it is jointly represented by a complex velocity); here we represent
not generally possible (or desirable) to actually invert the very the imaginary parts explicitly as additional real-valued
large impedance matrix S. Eq. (5) is often solved using direct parameters.
matrix factorization methods, such as LU decomposition The wave¢eld vector, u, and the source vector, f, are l|1
(Press et al. 1992; Pratt 1990). If LU decomposition is used to column vectors; the complex impedance matrix, S is an l|l
solve eq. (5), the matrix factors can be re-used to solve rapidly matrix. All quantities except the model parameters are
the forward problem for any new source vector, f [for typical complex quantities. Note that, although we will treat eq. (5) as
problem sizes, the additional number of £oating point if it describes forward modelling for a single source position,
operations required to generate the solution for a new source additional source locations can be incorporated simply by
term is less than 1 per cent of the number of operations increasing the number of elements in u by l for each additional
required to generate the LU matrix factorsösee Stekl & Pratt source; S and S{1 then have block diagonal structures, in
(1997) for details]. This point is especially important in the which each diagonal block is an identical submatrix. We could
iterative solution of the inverse problem, in which many also represent additional frequency components in the same
forward solutions for real sources and `virtual' sources will be manner, although the diagonal block submatrices of S would
required at each iteration. It is critical to use ordering schemes then no longer be identical. The same comment applies to the
that allow maximum advantage to be taken of the sparsity of 2.5-D method of Song & Williamson (1995), in which a new
both S and its LU factorization; nested dissection (Liu & diagonal block would be generated for each wavenumber con-
George 1981; Marfurt & Shin 1989) is such a method. sidered. Ultimately we could end up with a complete time-
We shall refer to any modelling approach based on eq. (5) as domain representation in this manner; usually, however, it
`frequency-domain modelling', or FDM. The analysis given in would seem to be prohibitively expensive, and unnecessary, to
this paper will be generally applicable to the inversion of any actually proceed in this manner.
forward problem, geophysical or otherwise, that can be cast in By examining the solutions to eq. (5) when the components
the form of eq. (5), although the physical interpretation of the of the source vector, fi , are replaced by a Kronecker delta, dij , it
results herein will be speci¢c to the seismic problem. We now is clear that the columns of S{1 must contain the discrete
introduce a speci¢c discretization, depicted in Fig. 1, in which approximations to the Green's functions. Thus,
the wave¢eld is to be computed at nx |nz ~l nodal points on
a regular grid (the grid is 2-D for illustration purposes, but S{1 ~[g(1) g(2) . . . g(l) ] , (6)
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:11:42 3B2 version 5.20

where the column vectors g(j) approximate the discretized of the algorithms. In the solution of real inverse problems,
Green's function for an impulse at the jth node. If the these terms provide numerical stability, allow meaningful
numerical problem is exactly reciprocal with respect to an prior statistical information to be incorporated, and allow
interchange of source and receiver elements, then both S and meaningful a posteriori statistical information to be extracted
S{1 are symmetric matrices, and not Hermitian (self-adjoint) (see e.g. Tarantola 1987).
matrices. [In implementation S is often not perfectly sym-
metric when certain (unphysical) absorbing boundary con-
ditions are implemented (Pratt 1990). This does not cause The gradient method of inversion
problems in the iterative inverse solutions we will present in the The gradient method is a recipe for reducing the l2 norm (8) by
next section.] iteratively updating the parameter vector according to
p(kz1) ~p(k) {a(k) +p E (k) , (9)
T H E I N V E R S E PRO B L E M I N T H E
S PAC E ^ F R E QU E NC Y D OM A I N where k is an iteration number and a is a step length (a positive
scalar) chosen to minimize the l2 norm in the direction given by
In this section we shall review methods for solving the the gradient of E(p). The role of the step length can also be
non-linear seismic-waveform inversion problem using local thought of as converting the units of the gradient vector to
methods (for the size of problems we are interested in, it is model dimensions.
still not feasible to consider a global search for the optimal The gradient of the mis¢t function represents the direction
parameter set). By posing the inverse problem in the frequency in which the mis¢t function is increasing most rapidly. The
domain, and speci¢cally assuming the forward problem can mis¢t function can always be reduced by pursuing the negative
be solved by FDM as in eq. (5), we shall obtain speci¢c of this direction. The iteration in eq. (9) is performed until
algorithms for FDI that are straightforward to understand and some suitable stopping criteria is reached. The convergence
implement, and additionally are perfectly general for any rate of the gradient method is generally quite slow; con-
forward problem that can be represented by FDM. vergence can be improved by adopting a conjugate gradient
We suppose we have n experimental observations, d, at a approach (see for example Mora 1987a), which does not
subset of nodal points corresponding to receiver locations. It is require any signi¢cant additional computations.
convenient to assume the node points are ordered in such a We may evaluate the gradient direction by taking partial
way that the ¢rst n¦l node points are receiver locations derivatives of eq. (8) with respect to the model parameters, p,
(see Fig. 1, but the results we obtain are not speci¢c to such yielding
an ordering scheme). The inverse problem is to infer a set of
LE
model parameters, p, that would predict the observations using +p E~ ~Re{Jt dd} , (10)
our forward-modelling algorithm. We also assume we have a Lp
suitable initial model, p(0) , that is close enough to the global where Jt is the transpose of the n|m Frechët derivative matrix,
solution to allow successive relinearizations. (In seismic data the (complex-valued) elements of which are given by
analysis, such models can often be generated by traveltime
analysis.) Given an initial model, we can calculate the Lui
Jij ~ , i~(1, 2, . . . , n) ; j~(1, 2, . . . , m) . (11)
response, using the FDM method, u~S{1 f. Lpj
The residual error at the n receiver node points, dd, is de¢ned In eq. (10) we assume there are m model parameters, so that p
as the di¡erence between the initial model response and the and +p E are column vectors of length m. The mathematics
observed data at the receiver locations. Thus demand that the real part of the complex-valued vector Jt dd
ddi ~ui {di , i~(1, 2, . . . , n) , (7) be takenöthis ensures the gradient of the real-valued mis¢t
function with respect to real-valued parameters remains real.
where the subscripted quantities are the individual components (It is also possible to obtain a real-valued result automatically
of dd, u and d, and the subscript i represents the receiver if each frequency component of the data is accompanied by the
number. corresponding negative frequency component. This makes
As is common in many inverse problems, we seek to mini- use of the conjugate symmetry of the Fourier transform of
mize the l2 norm of the data residuals. Thus we seek to real-valued, time-domain data.)
minimize the `mis¢t' function (or òbjective' function) The computation of the step length, a, required in eq. (9) is
1 t straightforward. For linear forward problems the step length is
E(p)~ dd dd , (8) given by the formula
2
where the superscript t represents the ordinary matrix trans- j+p Ej2
pose and the superscript represents the complex conjugate, a(k) ~ , (12)
jJ+p Ej2
introduced to ensure the mis¢t function is a true (real-valued)
norm for complex-valued data. where D D represents the Euclidean length of the vectors. For
The simple mis¢t function in eq. (8) neglects any incor- non-linear forward problems (such as the seismic problem), the
poration of a priori statistical information on the data or on step length must be found using line-search techniques along
the model in the form of covariance matrices. We have, in the direction opposite to the gradient.
e¡ect, assumed an identity data covariance matrix and We now wish to link explicitly the computation of the
assumed in¢nite a priori model variances. More realistic gradient vector for the FDI problem to the forward FDM
covariance matrices may of course be incorporated, but we problem given in eq. (5). To do this we ¢rst augment the m|n
deliberately omit them as they tend to obscure the simplicity matrix J with the additional terms required to de¢ne partial
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:12:16 3B2 version 5.20

derivatives at all node points, not just at the receiver locations, there will be m~l columns and Jê is a square matrix.
to obtain a new m|l matrix Jê . We may then write a new Furthermore, the virtual sources will then be highly local,
equation, equivalent to eq. (10): approximating point sources, with a seismic moment that
depends on the details of the parametrization. In this case the
+p E~Re{Jê t ddê } , (13) computation is directly comparable with the Born method
where ddê is the data residual vector, of length n, augmented for computing wave¢eld perturbations due to ¢rst-order
with (l{n) zero values to produce a new vector of length l. scattering.
Explicitly, eq. (13) represents The computation of the partial derivatives of the impedance
8 2 39 matrix, LS/Lpi , required in the computation of the virtual
2 3 > 2 3 dd 1 > sources (15), will depend on the speci¢c details of the ¢nite
LE >
> Lu1 Lun Lunz1 Lul >
> 6 7>>
6 Lp1 7 >
>
> 6 Lp1 . . . Lp1 Lp
...
Lp 76 . 7 > > approximation method used (that is on the type of wave
6 7 >6
>
1 1 76 .. 7 > >
>
6 7 >
>6 6 7 6 7 > equation, the parametrization, the order of the di¡erencing
6 LE 7 >
> Lu Lu Lu Lu 76 7>>
>
6 7 >
> 6 1 n nz1 l 7 6 7 > scheme, the basis functions, etc.). However, in all cases these
6 7 <6 ... ... 76 dd n 7 >
=
6 Lp2 7 6 Lp2 Lp2 Lp2 Lp2 76 7 partial derivatives are relatively trivial to compute [see Shin
6 7 ~Re 6 76 7
6 7 >
> 6 76 7> (1988) and our Appendix A].
6 .. 7 >
>6 .. 7 6 0 7> >
6 . 7 >
> 6 ... .. .. .. .. 76 7>>
6 7 >
> 6 . . . . . 76 . 7 > >
>
6 7 >
>4 6 7 6 7 >
>
4 LE 5 >
> Lu Lu Lu Lu 56 .. 7 > >
>
> 1
. . .
n nz1
. . .
l 4 5 >
> E¤cient calculation of the gradient direction
Lpm >
: Lpm Lpm Lpm Lpm >
;
0 Since we could generate an equation similar to eq. (15) for
8 2 39 any choice of i, we can represent all the partial derivatives
> 2 3 dd 1 >
>
> Lut > simultaneously by the matrix equation
>
> 6 7> >
>
>
> 6 Lp 7 6
1 76 . 7 >
. 7 >
>6
>
> 6 7 6 . 7
>
>
> Lu Lu Lu
>
> 6 Lut 76 7> > Jê ~ ... ~S{1 [ f (1) f (2) . . . f (m) ]
>
> 6 7 6 7 >
> Lp1 Lp2 Lpm
>
<6 76 dd n 7 > =
6 Lp2 76 7
~Re 6 76 7 (14) or Jê ~S{1 F , (17)
>
> 6 76 7>
>6
> . 76 0 7 > >
>
>
> 6 .. 76 7> where F is an l|m matrix, the columns of which are the virtual
>
> 6 76 . 7 > >
>
> 6 76 . 7 > >
> source terms for each of the m physical parameters. Eq. (17)
>
> 4 Lu 4t 5 6 . 7 >
>
>
> 5> > gives an explicit formula for the Frechët derivative matrix J
>
: >
;
Lpm 0 (being the ¢rst n¦l rows of Jê ). Computation of the elements
of J using eq. (17) would require m forward-propagation
(recall, n is the number of receiver points, l is the number of problems to be solved, in addition to the one required to
node points, and we have ordered the node points in such a compute the virtual sources using eq. (16). However, in order
manner that the ¢rst n node points correspond to the n receiver to compute the gradient using eqs (10) or (13) it is not necessary
points). to compute the elements of J explicitly. Substituting (17) into
An expression for any of the partial derivatives in eq. (14) in (13) we obtain
terms of the forward-modelling matrix eq. (5) can now be
+p E~Re{Jê ddê }~Re{Ft [S{1 ]t ddê }
t
obtained by taking the partial derivative of both sides of eq. (5) (18)
with respect to the ith parameter pi :
or
Lu LS Lu
S ~{ u or ~S{1 f (i) , (15) +p E~Re{Ft v} , (19)
Lpi Lpi Lpi
where we have introduced the ith `virtual' source term where
LS v~[S{1 ]t ddê (20)

f (i) ~{ u, (16)
Lpi is the `backpropagated wave¢eld'. If the inverse impedance
itself an l|1 vector. This development is the same as that matrix S{1 is symmetric, as we expect for reciprocal problems
used by Oristaglio & Worthington (1980) and Rodi (1976) to (see eq. 1), then [S{1 ]t ~S{1 and
develop partial derivatives for the electromagnetic problem. By
v~S{1 ddê . (21)
analogy with eq. (5), the partial derivatives in eq. (15) are the
solution to a new forward-modelling problem, now driven by (If the inverse impedance matrix is not symmetric, the trans-
the virtual sources at the location of the ith parameter. The pose of the LU decomposition, U t Lt may be used to generate
virtual source represents the interaction (or scattering) of the v by the usual forward-reduction and back-substitution
predicted (or background) wave¢eld, u, with the parameter pi . processes.) In either eq. (20) or eq. (21), computing the new,
We will therefore refer to Lu/Lpi as the `partial derivative backpropagated wave¢eld, v requires only one additional
wave¢eld from the ith node' ( pi does not necessarily represent forward problem to be solved. Since the computation of the
a nodal parameter, but it is convenient to think of it this way). step length (in the linear approximation) also requires the
t
As shown in eq. (14), each row of Jê (i.e. each column of Jê ) solution of a forward problem, this brings the total number of
contains a partial derivative wave¢eld from a single physical forward solutions required to three. The forward and back-
parameter; there are m such columns. If the inversion para- propagation problems are solved in the same model, and hence
meters are de¢ned as the (discrete) values of a single physical the stored LU factors of S can still be used to compute these
parameter at the node points (the `point collocation' scheme), forward solutions rapidly.
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:12:40 3B2 version 5.20

In the development above the original impedance matrix using the unreversed data residuals and the time-reversed
(and not its adjoint) is used to compute the backpropagated (adjoint) wave equation.
¢eld, and the conjugate of the data residuals is used in eqs (20) It is informative to use eqs (16) and (19) to express the ith
and (21). An alternative development can be obtained by component of the gradient vector as
recognizing that t
t LS
t (+p E)i ~Re{f (i) v}~Re ut v , (25)
+Ep ~Re{Jê ddê }~Re{Jê ddê } .
t
(22) Lpi
from which it is evident that where LSt /Lpi consists of only
This leads to an alternative de¢nition of the backpropagated
highly local non-zero values at, or near, the diagonal of the ith
¢eld,
row (as it will for the point collocation scheme), the gradient
w~v~[S{1 ]t ddê ,

(23) can be computed by a scaled multiplication (convolution in the
time domain) of forward and backpropagated wave¢elds. This
or, for reciprocal problems, is the description usually given for the computation of the
w~v~[S{1 ]ddê . (24) gradient vector, and it is clearly closely related to reverse time
migration algorithms, and to Claerbout's (1976) U/D imaging
In eq. (23) or (24) we do not conjugate (time reverse) the data principle. The analogy with pre-stack reverse time migration
residuals, but we use the adjoint of the impedance matrix has been made by many authors, and is further illustrated in
to compute the backpropagated ¢eld. We still use eq. (19) to the next section of this paper.
compute the gradient.
Either form of backpropagation can used, eq. (20) or (23).
A physical interpretation of the gradient
The former is easier, as the original impedance matrix may
be used, rather than its adjoint. The latter form [the back- In this section we examine the role the partial derivative
propagation method given in eq. (23) or (24)] exactly parallels wave¢elds play in the gradient method, and we illustrate the
the backpropagation method derived by Lailly (1983) using e¡ectiveness (or otherwise) of the gradient vector as an esti-
continuous methods of functional analysis. The FDI formalism mate of the parameter updates by using a point di¡ractor
replaces the notions of functional analysis with the simpler model. Although the illustration we provide here is rather
notions of matrix algebra; this will prove useful in the next classical, and does not rely on the FDI formalism, we provide
section when we consider the Newton methods of inversion. this illustration in order to introduce the physical concepts of
Ultimately, computer programs for gradient methods can be virtual sources and partial derivative wave¢elds that are
developed from either formalism without any fundamental crucial in order to understand the FDI implementations of the
di¡erences. Tarantola (1986), Mora (1987a), Kolb et al. (1986), Gauss^Newton method and the full Newton method to follow.
Pratt (1990), Pratt & Worthington (1990) and Chavent & In order to illustrate these concepts, we use the small point
Jacewitz (1995) have all exploited this technique in computing di¡ractor model depicted in Fig. 2. A single source is located
the gradient of the mis¢t function for non-linear waveform centrally just below the surface, and 21 receivers are distributed
inversion. just below the surface from one end of the model to the other.
The backpropagation technique given by Lailly (1983) Synthetic frequency-domain data were computed using the
makes use of the adjoint state method, which has been used FDM method, followed by Fourier synthesis to create time-
extensively in optimal control theory since at least 1956 domain data, using 16 frequencies between 0 and 25 Hz. We
(Goodman & Lance 1956; Jurovics & McIntyre 1962; Lee & computed a residual wave¢eld by forward modelling twice
Marcus 1967). The matrix derivation of the backpropagation (with and without the anomaly), and we computed a number of
technique given in eqs (18) to (21) was derived by one of us representative partial derivative wave¢elds with respect to
(Shin 1988) several years ago, but the result has not received (point) perturbations of the velocities at the nodes indicated in
much attention in the literature. An identical result has since Fig. 3. The partial derivative wave¢elds were computed using
been independently obtained by Griewank (1989) in the ¢eld of eqs (15) and (16). Owing to the weak nature of the anomaly,
photon di¡usion. we expect the partial derivative wave¢elds to be a good
We may summarize the backpropagation method of eq. (21) representation of the wave¢eld perturbations that would be
as follows. The gradient is computed in two steps: (1) The observed for a true perturbation of the velocity.
`backpropagated' wave¢eld, v, is computed by solving a Since the behaviour of seismic data is easier to visualize in the
forward problem with the original impedance matrix, and with time domain than in the frequency domain, we will illustrate
the source terms replaced by the conjugate (time-reversed) the operation of the gradient method using the resultant time-
residuals. (2) The backpropagated ¢eld is multiplied by the domain wave¢elds (i.e. seismograms) depicted in Figs 3(a) to
virtual sources generated by the original predicted wave¢eld, u. (e). These ¢gures show the time-domain surface expressions of
Finally, we take the real part of the result. These operations can ¢ve partial derivative seismograms. The partial di¡erentiation
be compared with the equivalent time-domain operations pre- was computed with respect to the velocity parameters at
sented by Mora (1987a): Mora computed the gradient using a each of the ¢ve scattering locations shown in Fig. 3. From the
zero-lag cross-correlation of each virtual source time series preceding analysis (see eq. 17), we recognize that these partial
with the backpropagated wave¢eld. The operation represented derivative seismograms represent forward propagation using
by eq. (18) is in fact a convolution of the backpropagated ¢eld virtual sources at the scattering locations. The virtual sources
with the virtual sourcesöthe di¡erence is due to the time are excited, or activated, by the background wave¢eld. In
reversal of the data residuals implied in eq. (21), and due to the the space^time domain, the virtual sources have numerical
fact that we use the original wave equation to propagate these. support only at the scattering locations and when the back-
Mora (1987a) formed a di¡erent backpropagated wave¢eld by ground wave¢eld arrives at the scattering location. In the
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:13:15 3B2 version 5.20

work of Hagedoorn (1954) in his discussion of maximum con-

vexity surfaces. Thus, Hagedoorn's migration of pre-stack data
is equivalent to the ¢rst iteration of a gradient method.
For band-limited data, as the partial derivative wave¢eld is
progressively time delayed and shifted in space with respect to
the residual wave¢eld, the wave¢elds go in and out of phase,
leading to decaying oscillations in their correlation. Thus,
band-limited data cause `leakage' in the correlations, resulting
in a gradient that is not strictly localized at the true scattering
location. The wider the aperture, the less leakage will occur in
the correlations away from the central node.
For our test example, the resulting gradient vector using all
16 frequency components (after scaling using the correct step
length) is shown in Fig. 4. From the previous discussion, the
maximum element in the Re Jt dd vector should occur at
the location of the point scatterer. We also expect oscillatory
side lobe features as the partial derivative wave¢elds pro-
gressively correlate, then anti-correlate with the residual
wave¢eld. Both these features are indeed observed in Fig. 4.
The gradient shown in Fig. 4 can be seen to represent an
image of the original di¡racting point, with the location of the
point di¡ractor receiving the largest contribution (except at
the source point where the source and receiver singularities
combine). However, the image is not a particularly good one.
Figure 2. Test model containing a single anomaly embedded in a This is partly because the image is equivalent to pre-stack
1600 m s{1 homogeneous velocity model. The velocity of the anomaly migration of data from only a single source, but also because
is 1800 m s{1 . The single-source, multiple-receiver re£ection geometry an update based simply on a single computation of the
used to test the gradient and Gauss^Newton methods is shown in the gradient, without any preconditioning, is fundamentally
¢gure. The numerical experiment used 16 frequencies between 1 and
limited as a migration operator.
25 Hz. The ¢ve node points (a), (b), (c), (d) and (e) used to compute the
partial derivative wave¢elds in Fig. 3 are also identi¢ed.
Spurious correlations between the partial derivative wave-
¢elds and the residuals is not the only source of misplaced
parameter perturbations in the gradient vector. Events that
arise from multiple scattering, although not present in our
absence of other events in the background wave¢eld, the test example, would also spuriously correlate with the (single
virtual sources will return to zero after the background wave- scattered) partial derivative wave¢elds, further corrupting the
¢eld has passed through the cell. Because of this, the partial inversion results. In the next section we investigate to what
derivative seismograms display characteristic hyperbolic point extent Newton methods can improve this ¢rst iteration result
di¡raction responses. (In the frequency domain, the phase of by accounting for ¢nite frequency e¡ects and for irregular
the complex Fourier component associated with each trace in coverage of the target.
Fig. 3 changes across the receiver array hyperbolicly.)
The ¢nal seismogram in Fig. 3 depicts the actual data
residuals for the point di¡ractor seismogram, using an initial
guess of a homogeneous model. The classical hyperbolic Newton methods of inversion
response of this point di¡ractor is clear. The negative polarity Newton methods are derived by considering an expansion of
of this seismogram is due to our de¢nition of the data residuals the mis¢t function in eq. (8) as a Taylor series and retaining
as ui {di in eq. (7). terms up to quadratic order (Bertsekas 1982; Tarantola
Using this insight into the appearance of the partial 1987):
derivative seismograms, we now return to the computation of
1 t
the gradient vector (19). This consists of forming the m|1 E(pzdp)~E(p)zdpt +p E(p)z dp HdpzO(jdpj3 ) . (26)
vector Re{Jt dd}, where the elements of J are the partial 2
derivative wave¢elds. As can be seen from eq. (14), in the fre- H is the m|m Hessian second-derivative matrix, the elements
quency domain each element of the vector Jt dd contains a of which are given by
partial derivative wave¢eld, sampled at the receiver points,
multiplied with the conjugated (time-reversed) residual wave- L2 E(p)
Hij ~ , i~(1, 2, . . . , m) ; j~(1, 2, . . . , m) . (27)
¢eld, dd, following which the real part is extracted. The Lpi Lpj
corresponding operation in the time domain is zero-lag We seek a vector dp that will locate the minimum within the
correlation (of two real-valued time-series). The residual quadratic approximation. For linear forward problems this
seismogram (the time-domain expression of dd) will have a approach will converge in one iteration. Minimizing with
maximum correlation with the partial derivative seismogram respect to all components of dp, we ¢nd the solution is
emanating from the central node, and lesser correlations with characterized by
other seismograms. This process of correlation with the point
di¡ractor responses is closely related to the early migration Hdp~{+p E or dp~{H{1 +p E . (28)
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:13:39 3B2 version 5.20

Figure 3. Time-domain wave¢elds related to the model depicted in Fig. 2. These were computed by frequency-domain modelling (FDM) and
Fourier synthesis. 16 frequencies between 0 and 25 Hz were used. (a)^(e) The `partial derivative seismograms', computed with respect to perturbations
at the ¢ve corresponding node points in Fig. 2. (f) The residual seismogram, that is the di¡erence between the observed data in the true model and the
predicted data in the homogeneous model. The residual wave¢eld has the opposite polarity from the partial derivative wave¢elds; this results from
the de¢nition of the data residuals as ui {di in eq. (7).
In the gradient method of eq. (9) it is assumed that dp not optimal. Naturally, if gradient methods are applied
can be estimated by a scalar multiplied by the gradient iteratively, the inverse Hessian is e¡ectively reconstructed
+p E. Eq. (28) shows that a better estimate is the gradient, during the iterative process. However, preconditioning using
preconditioned, or ¢ltered by the inverse Hessian. For the inverse Hessian, or an approximation of the inverse
this reason, in spite of the quasi-linearity of the problem Hessian, can yield improved convergence rates in iterative
examined using the gradient method in Fig. 4, the image is solutions.
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:14:10 3B2 version 5.20

We can explicitly express each element of the Hessian matrix

in eq. (27) by di¡erentiating eq. (8):
L2 E
Hij ~
Lpi Lpj
8 2 3
>
> Lu1
>
> 6 Lp 7
>
> 6 j7
>
> 6 7
>
> 6 7
>
> 6 Lu2 7
>
> 6 7
< Lu Lu2 Lun 6 7
~Re
1
... 6 Lpj 7
> Lpi Lpi 6
Lpi 6 7
>
> 7
>
> 6 .. 7
>
> 6 . 7
>
> 6 7
>
> 6 7
>
> 4 Lun 5
>
:
Lpj
2 39
dd 1 >>
6 7>>
6 7 > >
2 6 2 7>
dd =
L u1 L2 u2 2
L un 6 7
z ... 6 . 7 , (30)
Lpi Lpj Lpi Lpj Lpi Lpj 6 7>
6 .. 7 > >
4 5>>
>
>
;
dd n
from which, using eq. (11), one obtains
H~Re{Jt J}

L L L
zRe Jt dd Jt dd . . . Jt dd
Lp1 Lp2 Lpm
(31)
Figure 4. The gradient vector (scaled using the correct step length) or
for the ¢rst iteration of an inversion of the data from the single
anomaly model shown in Fig. 2. The gradient was computed using 16 H~Ha zR , (32)
frequencies between 0 and 25 Hz from a single source and 21 receivers
where
(as shown in Fig. 2). This image can be considered to be roughly
equivalent to pre-stack depth migration. Some of the artefacts that Ha ~Re{Jt J} (33)
appear may be removed by iteration or by preconditioning using the
Gauss^Newton method (see Fig. 6). is the àpproximate Hessian', and

L t dd . . . dd) .
R~Re J (dd (34)
Lpt
Eq. (28) suggests the Newton method for iterative solution There are m column vectors in the last term in eq. (34), each
(kz1) (k) {1
equal to dd [note that the ¢rst term in eq. (34) implies a
p ~p {H +p E . (29) speci¢c meaning for the partial di¡erentiation of a matrix with
Newton methods are normally avoided in large inverse respect to a row vector].
problems due to the large cost of computing and inverting the
m|m Hessian matrix. Nevertheless it is instructive to pursue The Gauss^Newton method of inversion
the analysis, since eq. (29) can be of use in problems that are
deliberately underparametrized so that m is small (Xu, Of the two terms in the expression for the Hessian matrix (32),
McMechan & Sun 1995). Also, as computers evolve, and as the ¢rst term, Ha , is straightforward to compute, whereas
increasingly accurate FDM techniques are developed, the the second term, R, is often di¤cult to compute. Moreover,
seismic inverse problem, when cast as an FDI problem, is no quoting from Tarantola (1987): `The last term is small if: i) the
longer the insurmountably large problem it used to be. As we residuals are small, or ii) the forward equation is quasi linear.
shall see, it will also be possible to recast eq. (29) in such a As in Newton methods we never need to know the Hessian with
manner as to require the inversion of only an n|n matrix, great accuracy, and as the last term in [eq. (32)] is, in general,
instead of an m|m matrix given by eq. (27). This latter di¤cult to handle, it is generally dropped o¡'. If we neglect the
approach has relevance for problems in which the number of second term (which clearly is only important if changes in the
data is considerably smaller than the number of parameters parameters cause a change in the partial derivative wave¢elds),
[Delprat-Jannuad & Lailly (1992) have advocated using a we obtain the Gauss^Newton formula
large number of parameters in conjunction with appropriate p(kz1) ~p(k) {H{1
a +p E or
regularization techniques in order to avoid discretization (35)
errors]. dp~{H{1
a +p E
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:14:15 3B2 version 5.20

(eq. 35 can also be obtained by linearizing the forward problem the source location, the partial derivative wave¢elds drop in
and applying the least squares `normal equation'). The full amplitude, and so too the corresponding elements of the
Newton method, to be discussed in the next section, di¡ers only approximate Hessian will drop in amplitude.
from the Gauss^Newton method by the inclusion of the second Due to excessive cost, the Hessian matrix is not normally
term in the Hessian. computed when the forward model is computed using time-
In eq. (35) we have assumed that the Hessian matrix has full domain ¢nite-di¡erence methods. However, by using the
column rank, and is thus invertible. Generally this will not be FDM/FDI method, and by taking advantage of the recent
the case. The Hessian matrix is often either ill-conditioned or e¤ciency advances made in FDM methods (as well as
actually singular. To improve and stabilize the Gauss^Newton advances made recently in the speed and memory con-
method for non-linear problems, some form of regularization ¢gurations available on desktop workstations), we are able to
will be required. Perhaps the simplest form of regularization is compute the approximate Hessian exactly for the FDM/FDI
to apply a damping term to the Hessian before inverting it: problem. Fig. 5 shows the (symmetric) approximate Hessian
matrix, Ha , computed using the homogeneous model in Fig. 2.
dp~{(Ha zjI){1 +p E (36) (The structure of the matrix naturally depends on the ordering
[suggested originally by Levenberg (1944) and Marquardt used for the node points, which, for the purposes of illustration
(1963)]. However, a more general approach is the stochastic here is a straightforward row ordering scheme.) The diagonal
non-linear least squares approach of Tarantola & Vallette dominance of the matrix with the o¡-diagonal bands is clear.
(1982), which reduces to the Levenberg^Marquardt algorithm The source and receiver locations also a¡ect the structure of
when diagonal covariance matrices are used to describe the approximate Hessian, due to the geometrical spreading in
the (Gaussian) data and model statistics. When strong amplitudes of the partial derivative wave¢elds. Fig. 5 was
damping is used (j&0), the damping term dominates and the computed with a single source in the model, resulting in large
Gauss^Newton method approaches the gradient method. correlation values for all elements of the approximate Hessian
Where we invoke regularization, we can represent the associated with the source location and its nearest neighbours.
approximate inverse thus obtained as the `pseudo-inverse', H{ . The situation would be more complicated if the background
The parameter estimates dpª can be related to the true dp using velocity model were not homogeneous; however, it is generally
true that the approximate Hessian matrix will be diagonally
dpª ~{H{ +p E~Ydp , (37) dominant and banded.
where The approximate Hessian matrix thus predicts the defocus-
ing that a¡ects the gradient vector due to the incomplete and
Y~{H{ Re{Jt J} (38) uneven illumination of the target, and the natural defocusing
that occurs due to limited bandwidth of the seismic experi-
is the resolution matrix in the linear approximation (see e.g.
ment. The structure of the approximate Hessian matrix (Fig. 5)
Ory & Pratt 1995).
is similar to a convolutional smoothing operator. The appli-
cation of the inverse Hessian thus sharpens, or focuses, the
A physical interpretation of the approximate Hessian gradient vector. One could argue that much subsequent work
in migration since Hagedoorn's work has been directed at
In this section we examine the role the partial derivative approximating the inverse Hessian to improve the focusing
seismograms play in the Gauss^Newton method by returning properties of migration. In the work of Lambarë et al. (1992)
to the point di¡ractor model of Fig. 2. We have already and others, the inverse Hessian is computed under the ray
examined the role the partial derivative wave¢elds play in the theoretical approximation, and this is used as an operator that
computation of the gradient vector. These concepts can also be focuses the gradient estimates.
used to understand the structure of the approximate Hessian The ¢ltering e¡ect of the inverse of the approximate Hessian
used in the Gauss^Newton method and the exact Hessian used on the gradient for our test model (Fig. 2) is shown in Fig. 6.
in the full Newton method (see next section). In order to solve This image depicts the ¢rst iteration of a Gauss^Newton
eq. (35), in addition to the gradient vector we must also form inversion. The removal of some of the artefacts seen on the
the m|m approximate Hessian matrix Ha ~Re{Jt J}. gradient image (Fig. 4) is clear, although the image still su¡ers
A similar argument to that used in our discussion of the from inadequate source^receiver coverage and low-frequency
physical meaning of the gradient vector can be applied to content. It is important to point out that a similar result to that
understand the structure of the Ha matrix. As can be seen from shown in Fig. 2 could have been obtained using the gradient
eq. (30), each element in Ha ~Re{Jt J} is the scalar product of method by iterating su¤ciently.
two partial derivative wave¢elds at the receiver locations, one Eq. (36) can thus be interpreted as a m|m ¢ltering
of which is conjugated. This operation corresponds to zero lag operation that is applied to focus the gradient vector, with
correlation in the time domain. From Fig. 3, it should be clear a damping term to improve stability. However, there is
that the partial derivative wave¢elds are largely uncorrelated another interpretation, which becomes evident if we examine
with each other (and of course perfectly self-correlated). In the an equation equivalent to eq. (36):
high-frequency limit, these wave¢elds would be perfectly ÿ {1 t
uncorrelated with each other. However, because the frequency dp~{ Kt KzjI K dd' , (39)
content is ¢nite, the partial derivative wave¢elds from adjacent
nodes are in fact correlated to some extent. Thus the Ha matrix where
is diagonally dominant, due to the autocorrelations occurring " # " #
Re{J} Re{dd}
on the main diagonal, and banded due to the ¢nite frequency K~ and dd'~
e¡ects. Furthermore, as the scattering points are removed from Im{J} Im{dd}
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:14:47 3B2 version 5.20

Figure 5. A representation of the approximate Hessian matrix, Re{Jt J}, for the model shown in Fig. 2. (a) The original 22|23~506 element grid,
and a detailed section of the approximate Hessian showing the organization of the matrix. (b) The total 506|506 element approximate Hessian for
the problem. Each pixel in the matrix measures the correlation between partial derivative wave¢elds emerging from the corresponding node points
and another node point. The wave¢elds are best correlated along the main diagonal, but non-zero, oscillatory behaviour is observed o¡ the main
diagonal.
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:15:11 3B2 version 5.20

method
dp~{(Ha zR){1 +p E , (41)
where

L t dd . . . dd ) .
R~Re J (dd (42)
Lpt
Most researchers tend to avoid the computation of the second
term, which is awkward to calculate and which in any case
should be small (provided the problem is approximately linear,
which, in practice, implies that the starting model is su¤ciently
close to the true answer). However, we have found that in the
FDI problem it is no more di¤cult to compute the second term
than the ¢rst. In the next two sections we show how a back-
propagation approach similar to that used to compute the
gradient vector can be used. The analysis leads directly to
an interpretation of the matrix R in terms of second-order
scattering (i.e. ¢rst-order multiples).
Calculation of the exact Hessian

The exact Hessian consists of two terms, the approximate
Hessian Ha , which we have already discussed, and R, de¢ned in
eq. (42). From eq. (30) the elements of R are
(" # )
L2 ut
Rij ~Re ddê
Lpi Lpj (43)
i~(1, 2, . . . , m); j~(1, 2, . . . , m) ,
Figure 6. Image computed from data in the model shown in Fig. 2 where we now augment dd with l{n zeros, as in eq. (14). It is
using a single iteration of the Gauss^Newton method. This image clear from this expression [and by analogy with eq. (13) for the
was computed from the same data as the single-iteration gradient gradient] that elements in R are computed by cross-correlating
image shown in Fig. 4. The artefacts in the gradient image have been second-order partial derivatives with the data residuals. The
e¡ectively removed by ¢ltering the gradient with the inverse of the
¢rst-order partial derivatives are related to ¢rst-order scatter-
approximate Hessian matrix (Fig. 5).
ing; it follows that the second-order partial derivatives are
related to second-order scattering. We will show the exact form
of this relationship in the next few paragraphs.
are real-valued matrices. Eq. (39) can be recast as In an earlier section we gave analytical formula for the ¢rst-
order derivatives in terms of ¢rst-order virtual sources. We now
dp~{Kt (KKt zjI){1 dd' (40)
develop an analytical formula for the second-order partial
[see Tarantola (1987) (problem 1.19) for a proof of this derivatives in terms of second-order virtual sources. The
manipulation]. In eq. (40) a 2n|2n (real-valued) ¢lter is second-order partial derivatives may be computed by pursuing
applied directly to the 2n (n real and n imaginary) residuals the analysis begun earlier:
before backpropagation. In problems in which n¦m (under-
Lu LS
determined problems) the use of this approach will reduce the S ~{ u (44)
Lpi Lpi
size of the matrix that needs to be inverted, although the
computation of all m partial derivatives is still required. This is (eq. 15 again). Taking the derivative of both sides of eq. (44)
very closely related to the ¢ltered backpropagation algorithm with respect to pj yields
developed by Devaney (1982) (see also Wu & ToksÎz 1987)
in order to implement di¡raction tomography. Eq. (40) thus L2 u LS Lu LS Lu L2 S
S z ~{ { u. (45)
appears to be the generalization of di¡raction tomography to Lpj Lpi Lpj Lpi Lpi Lpj Lpj Lpi
any background medium, with any source^receiver geometry
Solving eq. (45) for the required second derivatives yields
and for any wave equation (acoustic, elastic, electromagnetic,
etc.) that can be approximated by an FDM method as in L2 u L2 u
eq. (5). S ~{f (ij) or ~{S{1 f (ij) , (46)
Lpj Lpi Lpj Lpi
where we have introduced a new, second-order virtual source

The full Newton method of inversion
term
We now return to the exact expression of the Hessian
matrix, eq. (32). If we use this exact expression in the Newton LS Lu LS Lu L2 S
f (ij) ~ z z u. (47)
algorithm for inversion, eq. (29), we obtain the full Newton Lpi Lpj Lpj Lpi Lpj Lpi
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:15:24 3B2 version 5.20

Thus, as for the ¢rst derivatives, the second-order partial although R does correctly predict the presence of ¢rst-
derivatives can also be generated by solving a forward problem order multiple energy in the gradient vector, it will also
using a virtual source term. There are three terms in the virtual produce signi¢cant false predictions, especially near source or
sources de¢ned by eq. (47). The ¢rst term, (LS/Lpi )(Lu/Lpj ) is a receiver locations in the model. These spurious parameter
virtual source at the ith node, excited by the partial derivative perturbations may be such that the mis¢t cannot be further
wave¢eld from the jth node. Since the partial derivative reduced when incorporating the additional term in the
wave¢eld is itself a ¢rst-order scattered wave¢eld, this new Hessian, in which case the matrix R is not helpful. In such a
virtual source term generates the double scattered wave¢eld or case, the matrix R can destroy the positive de¢niteness of the
¢rst-order multiple from these two node points. Similarly, approximate Hessian. If the Hessian is no longer positive
(LS/Lpj )(Lu/Lpi ) is a virtual source at the jth node, excited by de¢nite, then our estimate of the mis¢t function E(p) is not
the partial derivative wave¢eld from the ith node. The ¢nal locally convex, and gradient methods are preferable.
term, (L2 S/Lpj Lpi )u, depends on the parametrization and the
exact details of the ¢nite-approximation scheme. In most cases,
E¤cient calculation of the exact Hessian
where orthogonal basis functions are used to parametrize the
model (such as the point collocation scheme), this term will Computing all possible second-order derivatives using eq. (46)
equal zero unless i~j. We may therefore think of L2 u/Lpj Lpi would require m2 forward problems to be solved, in addition to
as the sum of two doubly scattered wave¢elds, the ¢rst the m forward problems that are required to compute the
representing scattering from node j, then from node i, and the virtual sources (one for each partial derivative wave¢eld) in
second representing scattering from node i, then from node j. eq. (47). However, eq. (43) only requires that we know the
Returning to eq. (43), we see that the i, jth element of R action of the second derivative matrix on the residuals, dd.
represents the sum of the doubly scattered wave¢elds, each Taking the transpose of eq. (46) yields
conjugated, and multiplied by the data residuals. Thus Rij is " #t
a measure of the correlation of the residuals with second-order L2 u
~{[f (ij) ]t [S{1 ]t . (49)
scattered events. The e¡ect on the gradient of ¢rst-order Lpi Lpj
multiples, as a speci¢c class of double scattered events, will
therefore be included in the correlation information contained Substituting eq. (49) into eq. (43) yields
in R. It should also be noted that the diagonal elements Rii can t
Rij ~{Re{ f (ij) v} , (50)
take on non-zero values when the scattered waves are re£ected
back to the scattering node to be rescattered, such as when where v is de¢ned, as before, by
there is a free surface in the model.
v~[S{1 ]t ddê (51)
With this understanding of the elements of R, let us
re-examine the full Newton estimate of the parameter updates, (eq. 20 again). By analogy with the computation of the
{1 gradient vector, each element of R is computed in two steps.
(Ha zR) dp~{+p E or dp~{(Ha zR) +p E . (48)
(1) The wave¢eld, v, is computed by backpropagation (this ¢eld
The two components of the Hessian sum to act as a ¢lter that may already be available following the computation of the
relates the parameter updates to the gradient vector. We have gradient vector). (2) The backpropagated wave¢eld is multi-
seen that the ¢rst term, Ha ~Re{Jt J}, predicts the defocusing plied by the second-order virtual sources (created by the ¢rst-
inherent in the correlation technique due to ¢nite frequencies order scattered wave¢elds), following which the real (zero
and uneven source^receiver coverage. We now see that the phase) component is extracted. Thus this step requires the
second term, R, predicts the artefacts in the gradient vector solution of m additional forward problems, required to
that occur due to double scattered events. The inverse of compute the second-order virtual sources from the ¢rst-
the full Hessian therefore acts as a deconvolution operator order partial derivatives using eq. (47) (and these ¢rst-order
that now includes terms that operate on ¢rst-order multiples. partial derivatives may already be available following the
The second term in the Hessian, usually neglected, therefore computation of the approximate Hessian). Although m may be
has the potential for acting as a de-multiple operator, even a large number of forward problems to solve, it is much less
in the ¢rst iteration. Gradient methods applied iteratively than m2 . We noted earlier that there are cases in which the
would eventually take care of the same e¡ects, but the full problem may be initially characterized by a reduced parameter
Newton method has the potential for converging more set, so that m may be quite small. We illustrate such a problem
rapidly. Nevertheless, the convergence of any of these methods in the next section of this paper.
in the presence of multiple energy is entirely dependent on
being within the region of the global minimum of the mis¢t
T HE F U L L N EW TO N M ET HOD I N
functionöand the higher the order of multiples in the data, the
E ST I M AT I NG BAC KG RO U N D V E L O C I T I E S
more di¤cult it is to ensure that this condition is met.
As previously discussed, the correlation technique for The introduction of the approximate and full Hessian matrices
the gradient vector produces false anomalies by mistakenly in the previous sections, and the e¤ciency of the FDM/FDI
correlating energy from multiple scattering in the residuals methods, allow one to consider applying the full Newton
with the partial derivative wave¢elds. Similarly, in the com- method to realistic, full-sized seismic-waveform inversion
putation of R, single scattered energy in the residuals is mis- problems. For the full Newton method to be of utility, the
takenly correlated with the second-order partial derivative problem should meet the following two criteria: (1) it must be
wave¢elds. This erroneous correlation is far more signi¢cant possible to parametrize the problem using only a small number
for the latter case, as single scattered energy is much larger of parameters, in order to avoid having to compute and
in amplitude than double scattered energy. Consequently, invert large Hessian matrices, and (2) the problem should be
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:15:58 3B2 version 5.20

quasi-quadratic, with signi¢cant second-order non-linearities. spectrum of the 1-D velocity model from the recorded
A classical problem that meets both these conditions is data, starting from a single, reasonable velocity gradient
that of estimating the smoothly varying, low-wavenumber model. One might initially expect that the problem is
`background' velocity in the inversion of seismic-re£ection amenable to a straightforward application of iterative
data. Such problems have been the subject of numerous inversion methods. Both high and low wavenumbers are
investigations (Mora 1987b, 1989; Snieder et al. 1989; Cao et al. represented in the data: the high wavenumbers generate
1990; Symes & Carazzone 1991; Chavent & Jacewitz 1995; the re£ections in the ¢rst place, and control the zero-o¡set
Jervis, Sen & Sto¡a 1996; Varela, Sto¡a & Sen 1996). In this time and the amplitudes of the re£ections; the low wave-
section we illustrate the use of the gradient, Gauss^Newton numbers control the change in traveltimes of the re£ections
and full Newton methods in helping to solve this problem with o¡set (the seismic `moveout'). However, it has long
e¡ectively. been recognized that simple gradient-type methods applied to
The velocity model used in this study is shown in re£ection data of this type either fail to converge, or fail
Fig. 7(a). The velocity varies only with depth, and consists of a to converge su¤ciently rapidly on the long-wavelength com-
low-wavenumber (background) component, combined with a ponents of the velocity ¢eld (Gauthier et al. 1986; Mora
higher-wavenumber (re£ective) component. For simplicity, in 1987a,b, 1989).
this study the density is assumed to be constant (and known). A partial solution to the low-wavenumber convergence
We synthesized a re£ection experiment over the top of this problem is to formulate an additional and entirely separate
model using a single source and a split spread of 28 receivers, inversion step designed to solve only for low wavenumbers.
with a maximum o¡set of 700 m. These synthetic data are However, the low wavenumbers may fail to converge on the
shown in Fig. 7(b), along with several traces representing correct solution due to the existence of local extrema in
the high-wavenumber model in time, after convolving with the the mis¢t function. Local extrema may be encountered, for
source wavelet. The source wavelet has a dominant frequency example, with an incorrect velocity model for which the
of 10 Hz and a maximum frequency of 25 Hz. There is a predicted zero-o¡set time of one arrival coincides with the
good correlation between the zero-o¡set trace and the high- measured zero-o¡set time of a di¡erent arrival. Sen & Sto¡a
wavenumber model, although there is a decay in the data (1991) and Varela et al. (1996) suggested using simulated
amplitudes with time due to geometrical spreading, and there annealing to overcome this di¤culty. Simulated annealing,
is some evidence of interbed multiples in the data. however, requires a large number of forward models to be
The objective in this simple numerical experiment is evaluated. The low-wavenumber inversion problem is also
to extract as much information as possible about the full strongly coupled to the high-wavenumber problem, since
perturbing the background velocity perturbs not only the
moveout of the re£ections, but also the zero-o¡set two-way
traveltimes. Snieder et al. (1989) showed that a repara-
metrization of the high-wavenumber function from depth
to two-way vertical traveltime can be used to help decouple
the low- and high-wavenumber inversion problems. In this
approach, the zero-o¡set times remain unchanged as the low
wavenumbers are altered, and for this reason local extrema are
much less likely to be encountered. However, the problem is
still strongly non-linear. Snieder et al. (1989) and Cao et al.
(1990) suggested a simplex search algorithm for the low
wavenumbers.
In this example we adopt two of the suggestions described
in the previous paragraph: we separate the high- and low-
wavenumber problems by formulating distinct inversion steps
for each of these, and we reparametrize the high-wavenumber
inversion problem from depth to time, in order to decouple the
two inverse problems. With such an approach, we now show
how Newton methods can be e¡ectively used to converge on
the low wavenumbers in the problem. We parametrize the low-
wavenumber velocity ¢eld using a small number of cubic spline
node points (after Cao et al. 1990, for example). In order to
avoid complication, we assume from here on that the high-
wavenumber component of the velocity ¢eld is known in two-
Figure 7. (a) Depth model used to test the performance of the way time (but not in depth). This is not a drastic assumption, as
gradient and Newton methods in estimating the `background' velocity.
a good re£ectivity estimate in two-way time can usually be
The model is built using a combination of a low-wavenumber (back-
recovered in a straightforward manner, either by using a
ground) velocity ¢eld (dashed) and a high-wavenumber (re£ectivity)
velocity ¢eld. The sum of these two components is the total velocity^ simple stack of the data, or, in the case of lateral velocity
depth function (solid). (b) Synthetic seismic-re£ection data computed variations, by applying the standard method of seismic time
for the model shown in (a) (the direct arrivals have been suppressed on migration.
this plot for clarity). The central traces in the panel represent the high- The algorithm used to compute the update to the low-
wavenumber velocity model in two-way time after convolution with the wavenumber velocity ¢eld for a given iteration is outlined
source signature. below.
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:16:22 3B2 version 5.20

(1) We begin by explicitly calculating the required partial ¢rst and second terms in this expansion, respectively. The
derivatives. The elements of the n|m Frechët derivative matrix matrix R is computed using the backpropagation method of
are de¢ned as eq. (48), requiring us to solve (only) one more forward problem
to compute the backpropagated wave¢eld.
Lui
Jij ~ , i~(1, 2, . . . , n) ; j~(1, 2, . . . , m) (52) (6) As yet the relative scale factors for the two terms,
Lpj
dpª (1) and dpª (2) , are undetermined. The optimal full Newton
(eq. 11 again), where there are n data and m model parameters. parameter perturbation vector lies in the plane de¢ned by the
In this study we use ¢ve frequency components of the data vectors given by eqs (56) and (57). It may be fully described by
(equally spaced between 5 and 25 Hz), thus for the 28 receivers
there are n~28|5~140 data. We parametrize the low- dpª ~adpª (1) zbdpª (2) . (58)
wavenumber model using the velocity values at each of four The direction of this perturbation vector depends on the
cubic spline node points, thus for the low-wavenumber relative scale factors, a and b. These are obtained by solving a
problem there are only m~4 model parameters. We com- 2-D subspace Gauss^Newton inversion:
pute the partial derivatives (the columns of J) directly, by " #
individually perturbing each of the four low-wavenumber a
parameters. During the perturbation, the high-wavenumber (Re{At A}zjI)ÿ1 Re{At dd} , (59)
b
velocity model is stretched vertically in such a manner as to
maintain the vertical two-way traveltimes (this e¡ectively where A is an n|2 matrix containing the two partial derivative
reparametrizes the problem from depth, to time, after Snieder seismograms produced by perturbing the current velocity
et al. 1989). In this manner, the data perturbations contain ¢eld in the directions of dpª (1) and dpª (2) . Damping is again
perturbed moveouts, but unperturbed zero-o¡set traveltimes. introduced due to the ill conditioning of the projected
The e¡ect of the high- and low-wavenumber components of the Hessian, At A. As a result, we again require a step-length
velocity ¢eld on the data are thus decoupled. computation for this perturbation vector, before applying it to
(2) The (low-wavenumber) gradient vector is computed the low-wavenumber model.
using, (7) The ¢nal low-wavenumber velocity update given by
+p E~Re{Jt dd} (53) eqs (53) (for the gradient method), (56) (for the Gauss^Newton
method) and (58) (for the full Newton method) is found (after
(eq. 10 again). If we are using the gradient method to solve the computation of a step length in each case) by interpolating
for the low wavenumbers, our model update is obtained by between the m node points, using cubic splines. In each case,
stopping here, computing a step length, and applying it to the after the low-wavenumber perturbation has been used to
gradient vector +p E using update the low-wavenumber velocity ¢eld component, the
high-wavenumber velocity model is stretched so as to preserve
dp~{a(k) +p E (k) , (54)
the vertical two-way traveltime information, forming a ¢nal
as in eq. (9). new velocity^depth function.
(3) If we are using a Gauss^Newton method, we now
The results of applying these algorithms to the example
compute the approximate Hessian matrix, given by
problem are shown in Fig. 8. The ¢rst panel, Fig. 8(a), shows
Ha ~Re{Jt J}zjI , (55) the unscaled, low-wavenumber perturbation functions com-
puted for (1) the gradient, (2) the Gauss^Newton perturbation,
where we include a damping factor j, estimated from the dpª (1) , and (3) the second-order perturbation function, dpª (2)
square root of the sum of the squares of the elements of J (that is the terms given by eqs 53, 56 and 57, respectively, before
(Dimri 1992) . The partial derivatives are available, as we the computation of a step length). The perturbation functions
have already computed them explicitly when computing the are constrained to zero within the water layer (we assume we
gradient vector in the previous step. know the water depth and velocity). The gradient term is
(4) The (unscaled) Gauss^Newton parameter perturbations clearly dominated by the near-surface contribution, a direct
can then be computed: result of the decreasing sensitivity of the data to velocity
dpª (1) ~{H{1 perturbations with depth. The perturbation function derived
a +p E (56)
from the Gauss^Newton method is more hopeful: here the
(eq. 35 again, with the superscripts in the notation as in perturbation function peaks close to the depth at which the
eq. B9). The damping used in eq. (52) causes the length of starting model is most in error, and is negative below the centre
dpª (1) to be underestimated. Therefore, as for the gradient of the model. However, the Gauss^Newton perturbation
perturbation, we need to compute a step length and apply it to function, like the gradient term, does not contain a signi¢cant
this perturbation vector. contribution from the velocity node in the deepest part of
(5) In order to extend the inversion to the full Newton the model. We interpret this error as originating in the non-
method, an additional perturbation direction needs to be linearity of the problem; that is, the gradient term in the
computed. This contains the contribution of the second deepest part of the model varies signi¢cantly if the velocity ¢eld
(non-linear) term in the exact Hessian. We use the (unscaled) is changed in the shallower part of the modelöhigher-order
second-order perturbation terms are required to account for this variation. This increase
in error with depth can be understood with reference to the
dpª (2) &{H{1
a R dpª
(1)
. (57)
e¡ect of a velocity perturbation on the propagating wave¢elds
This equation is obtained from the series expansion in the last used to compute the gradient. As we perturb the velocity ¢eld,
line of eq. (B14) of Appendix B. Eqs (56) and (57) relate to the we will perturb the downgoing wave¢elds. Since the gradient
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:16:27 3B2 version 5.20

Figure 8. (a) Unscaled perturbation functions for the low-wavenumber problem (normalized to the true maximum error of 400 m s{1 ) for the
gradient method (dotted line), the Gauss^Newton method (short-dashed line) and the contribution from the non-linear term in the full Newton
method (long-dashed line). The circles depict the spline node points used. (b) Left: the low-wavenumber result of one, three and ¢ve iterations of the
gradient method. Right: the total velocity model after three iterations of the gradient method. (c) As in (b), but using the Gauss^Newton method.
(d) As in (b) and (c), but using the full Newton method.
term represents, e¡ectively, the scaled multiplication of two the low wavenumbers of the velocity model. In each case, we
such propagating wave¢elds, u and v (see eq. 22), the gradient show the ¢rst and third iteration results; for the gradient
will therefore be particularly a¡ected at large depths. The method we also show the ¢fth iteration result. For com-
second-order perturbation term dpª (2) shown in Fig. 8(a) clearly pleteness, we also show the ¢nal results including the high-
responds to the mismatch in the velocities at the bottom of the wavenumber components. Recall that we assumed that
model. the high-wavenumber component was known as a function
In Figs 8(b), (c) and (d) we depict the results of applying the of two-way vertical traveltimeöthe high-wavenumber com-
gradient, Gauss^Newton and full Newton methods to recover ponent was depth-corrected in each case using the current
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:16:58 3B2 version 5.20

low-wavenumber velocity estimate. It is evident from Fig. 8(b) The residuals for the full Newton method do not reduce the
that the gradient method, even after ¢ve iterations, fails to data residuals at 800 ms, which is due to the inability of the
produce any signi¢cant correction to the deeper part of the parametrization to ¢t the low-wavenumber velocity model
low-wavenumber model. As a result, the total velocity^depth (remarked on in the previous paragraph). However, the full
model, after inclusion of the high wavenumbers, is signi¢cantly Newton method does succeed in reducing the residuals at
in error. The Gauss^Newton approach, shown in Fig. 8(c), 1600 ms, nearly to zero. This success appears to be due to the
succeeds in restoring the low-wavenumber velocities every- ability of the full Newton method to account for the changes in
where except at the very bottom of the model in three the partial derivatives of the low-wavenumber inversion
iterations. The full Newton approach, in Fig. 8(d), succeeds in problem due to the velocity perturbations.
restoring the velocities to within approximately 80 m s{1 In theory, both the gradient method and the Gauss^Newton
down to a depth of 1600 m, still with only three iterations. As a method should eventually converge to the same result as the
result, the total velocity^depth model from the full Newton full Newton method, given a su¤cient number of iterations.
inversion is the closest to the true velocity^depth model. It This, however, did not prove to be the case for the gradient
should be noted that none of the methods succeeds in perfectly method, which was iterated further. We showed the result
reconstructing the depth model between 600 and 900 m, due after ¢ve iterations in Fig. 8(b); very little improvement
to the inadequacy of the four node spline parametrizationö was observed. This failure is in part due to the di¤culty in
naturally in a real inversion we would proceed by introducing accurately computing the step lengthöas the gradient method
more nodes into the parametrization at this stage. iterates, we ¢nd ourselves on either side of a long, narrow
In Fig. 9 we depict the data residuals for the three results valley in the mis¢t function, unable to locate the narrow base
shown in Figs 8(b), (c) and (d), on the same scale as the original of the valley (a more accurate, more expensive, step-length
data for the experiment (in each case the result after three computation might alleviate this to some extent, as would a
iterations is used). Although our inversions were carried out in conjugate gradient implementation). The Gauss^Newton and
the frequency domain, we computed these residuals by full full Newton methods succeed in locating a direction that points
time-domain simulation in the ¢nal models, and subtracting along the valley, rather than down the steep slope of the valley.
the result from the original synthetic data. It can be observed The cost of one iteration of each of the three methods is
that the gradient method fails to account for much of the comparable: we compute the partial derivatives explicitly in
dataöit has succeeded in accounting for some near-o¡set, each case (that is we do not use backpropagation to compute
shallow data, but the moveouts are not well accounted for, and the gradient, as this would not allow us to implement the time^
the residuals tend to increase with o¡set. The Gauss^Newton depth conversion in the high wavenumbers). Since we have
method performs signi¢cantly better after three iterations, four model parameters, four forward models are required to
accounting for much more of the data, leaving only some compute the partial derivatives. The step length in each case
energy in the residuals at about 800 ms and at about 1600 ms. requires one additional forward model to compute. The
Figure 9. (a) Re£ection data for the model shown in Fig. 7(a) (Fig. 7b again). For clarity, the direct arrivals have been suppressed. (b) Data residuals
after three iterations of the gradient method. (c) Data residuals after three iterations of the Gauss^Newton method. (d) Data residuals after three
iterations of the full Newton method. All seismic traces are displayed in true amplitude, with the same scaling factor being used for all four panels. For
reference purposes in each of these panels the central traces depict the high-wavenumber part of the velocity model, convolved with the source
signature.
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:17:03 3B2 version 5.20

Gauss^Newton method thus requires exactly the same work

R E F ER E NC E S
as the gradient method (neglecting the trivial number of
operations required for matrix inversion and matrix multi- Bathe, K.J. & Wilson, E.L., 1976. Numerical Methods in Finite Element
plication using the projected Hessian); the full Newton method Analysis, Prentice-Hall, NJ.
requires an additional forward model to compute the back- Bertsekas, D.P., 1982. Enlarging the region of convergence of
propagation, plus two more forward models to compute the Newton's method for constrained optimization, J. Optimization
Theory Applications, 36, 221^251.
relative scaling factors, a and b in eq. (59).
Beydoun, W.B., Delvaux, J., Mendes, M., Noual, G. & Tarantola, A.,
1989. Practical aspects of an elastic migration/inversion of cross-
hole data for reservoir characterization: a Paris basin example,
C O NCLUS I O N S Geophysics, 54, 1587^1595.
Beydoun, W.B., Mendes, M., Blanco, J. & Tarantola, A., 1990. North
In this paper we have introduced a link between a discrete, sea reservoir description: bene¢ts of an elastic migration/inversion
frequency^space method (FDM) for modelling seismic-wave applied to multicomponent vertical seismic pro¢le data, Geophysics,
propagation, and the frequency^space inversion methods 55, 209^218.
(FDI) that make use of the discrete modelling method. This Bunks, C., Saleck, F.M., Zaleski, S. & Chavent, G., 1995. Multiscale
linkage allows us to pursue a matrix formalism for the analysis seismic waveform inversion, Geophysics, 60, 1457^1473.
Cao, D., Beydoun, W.B., Singh, S.C. & Tarantola, A., 1990. A simul-
of the inverse problem. Speci¢cally, we have presented a new
taneous inversion for background velocity and impedence maps,
derivation of Lailly's (1983) fast method for computing the Geophysics, 55, 458^469.
gradient of the mis¢t function by backpropagation, we have Chavent, G. & Jacewitz, C.A., 1995. Determination of background
demonstrated the computation of the approximate Hessian, velocities by multiple migration ¢tting, Geophysics, 60, 476^490.
and we have given a new, fast backpropagation method Claerbout, J.F., 1985. Fundamentals of Geophysical Data Processing,
for computing the additional term in the exact Hessian Blackwell Scienti¢c Publications, Oxford.
matrix. These theoretical results are applicable to any partial Crase, E., Pica, A., Noble, M., McDonald, J. & Tarantola, A., 1990.
di¡erential equation that can be discretized using a complex- Robust elastic nonlinear waveform inversion: application to real
valued impedance matrix, including viscoacoustic and visco- data, Geophysics, 55, 527^538.
elastic equations (potentially including anisotropy), or, more Delprat-Jannaud, F. & Lailly, P., 1992. Ill-posed and well-posed
formulations of the re£ection traveltime tomography problem,
generally, potentially useful for problems based on Maxwell's
J. geophys. Res., 97, 19 827^19 844.
equations for electromagnetic ¢elds or for potential ¢eld Devaney, A.J., 1982. A ¢ltered backpropagation algorithm for
problems. The results hold for 1-D, 2-D and 3-D problems, and di¡raction tomography, Ultrasonic Imaging, 4, 336^350.
for 2.5-D problems. We have shown that the inverse Hessian Dimri, V., 1992. Deconvolution and Inverse Theory: Application to
can be manipulated to yield a ¢ltered backpropagation Geophysical Problems, Methods in Geochemistry and Geophysics,
algorithm, similar to that used in `di¡raction tomography' by Vol. 29, Elsevier, Amsterdam.
Devaney (1982) and by Wu & ToksÎz (1987), but generally Farra, V. & LeBegat, S., 1995. Sensitivity of qp-wave traveltimes and
applicable to all the physical problems listed above. polarization vectors to heterogeneity, anisotropy and interfaces,
For the acoustic 2-D seismic problem, we were able to Geophys. J. Int., 121, 371^384.
compute numerically the approximate Hessian matrix for a Gauthier, O., Virieux, J. & Tarantola, A., 1986. Two-dimensional
non-linear inversion of seismic waveforms: numerical results,
realistic-sized problem in re£ection seismology. We showed in
Geophysics, 51, 1387^1403.
a numerical example how the use of the inverse of the Goodman, T.R. & Lance, G.N., 1956. The numerical integration of
approximate Hessian would improve convergence in iterative two-point boundary value problems: mathematical tables and other
methods by refocusing the gradient image of single point aids to computation, 10, 82^86.
scatters. Convergence is a critical issue in iterative inversion for Grechka, V.Y. & McMechan, G.A., 1997. Analysis of re£ection
large seismic problems; since the FDM method can compute traveltimes in 3-d transversely-isotropic heterogeneous media,
partial derivatives rapidly by forward propagation from virtual Geophysics, 62, 1884^1895.
sources, we believe this approach may ¢nd some application. Griewank, A., 1989. On automatic di¡erentiation, in Programming:
In the ¢nal section of this paper we presented a second Recent Developments and Applications, pp. 83^108, eds Iris, M. &
numerical example. We used gradient, Gauss^Newton and Tanabe, K., Kluwer Academic, Dordrecht.
Hagedoorn, J.G., 1954. A process of seismic re£ection interpretation,
full Newton methods to compute the low-wavenumber (back-
Geophys. Pospect., 2, 85^127.
ground) velocity structure from simulated, multi-o¡set, surface Jervis, M., Sen, M.K. & Sto¡a, P.L., 1996. Prestack migration velocity
seismic-re£ection data. This is a rather classic problem in estimation using nonlinear methods, Geophysics, 60, 138^150.
re£ection seismology, of considerable importance. In our Jo, C.H., Shin, C.S. & Suh, J.H., 1996. Design of an optimal 9 point
example we showed the potential that the Gauss^Newton and ¢nite di¡erence frequency-space acoustic wave equation scheme for
full Newton methods have in accelerating convergence of inversion and modeling, Geophysics, 61, 529^537.
iterative solutions in such problems, without introducing Jurovics, S.A. & McIntyre, J.E., 1962. The adjoint method and its
excessive additional costs. application to trajectory optimization, ARS J., 32, 1354.
Kennett, B., Sambridge, M.S. & Williamson, P.R., 1988. Subspace
methods for large inverse problems with multiple parameter classes,
Geophys. J., 94, 237^247.
AC K NOW L E D GM E N T S Kolb, P., Collino, F. & Lailly, P., 1986. Pre-stack inversion of a 1-D
medium, Proc. IEEE, 74, 498^508.
We would like to acknowledge the helpful comments and Lailly, P., 1983. The seismic inverse problem as a sequence of before
suggestions made by Patrick Lailly and by an anonymous stack migrations, in Conference on Inverse Scattering: Theory
reviewer. GJH is supported by the Natural Environmental and Application, eds Bednar, J.B., Redner, R., Robinson, E. &
Research Council (UK). Weglein, A., Soc. Industr. appl. Math., Philadelphia, PA.
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:17:07 3B2 version 5.20

Lambare, G., Virieux, J., Mandariaga, R. & Jin, S., 1992. Iterative Song, Z.-M. & Williamson, P.R., 1995. Frequency-domain acoustic-
asymptotic inversion in the acoustic approximation, Geophysics, 57, wave modeling and inversion of crosshole data: Part I^ 2.5-D
1138^1154. modeling method, Geophysics, 60, 784^795.
Lee, E.B. & Markus, L., 1967. Foundations of Optimal Control Theory, Song, Z.-M., Williamson, P.R. & Pratt, R.G., 1995. Frequency-domain
John Wiley, New York, NY. acoustic-wave modeling and inversion of crosshole data: Part II^
Levenberg, K., 1944. A method for the solutoin of certain nonlinear inversion method, synthetic experiments and real-data results,
problems in least squares, Q. appl. Math, 2, 164^168. Geophysics, 60, 796^809.
Liu, J.W. & George, A., 1981. Computer Solution of Large Sparse Stekl, I. & Pratt, R.G., 1997. Accurate visco-elastic modeling by
Positive De¢nite Systems, Prentice-Hall, NJ. frequency-domain ¢nite di¡ferences using rotated operators,
Marfurt, K.J., 1984. Accuracy of ¢nite-di¡erence and ¢nite-element Geophysics, 63, in press.
modeling of the scalar and elastic wave-equations, Geophysics, 49, Symes, W.W. & Carazzone, J.J., 1991. Velocity inversion by di¡erential
533^549. semblance optimization, Geophysics, 56, 654^663.
Marfurt, K.J. & Shin, C.S., 1989. The future of iterative modeling of Tarantola, A., 1984. Inversion of seismic re£ection data in the acoustic
geophysical exploration, in Supercomputers in Seismic Exploration, approximation, Geophysics, 49, 1259^1266.
ed. Eisner, E., Seis. Expl., 21, 203^228. Tarantola, A., 1986. A strategy for nonlinear elastic inversion of
Marquardt, D.W., 1963. An algorithm for least squares estimation of seismic re£ection data, Geophysics, 51, 1893^1903.
nonlinear parameters, J. Soc. Industr. appl. Math., 11, 431^441. Tarantola, A., 1987. Inverse Problem Theory: Methods for Data Fitting
Mora, P.R., 1987a. Nonlinear two-dimensional elastic inversion of and Parameter Estimation, Elsevier, Amsterdam.
multio¡set seismic data., Geophysics, 52, 1211^1228. Tarantola, A. & Valette, B., 1982. Generalized nonlinear inverse
Mora, P.R., 1987b. Elastic wave¢eld inversion for low and high wave- problems solved using the least-squares criterion, Rev. Geophys.
numbers of the P- and S-wave velocities, a possible solution, in Space Phys., 20, 219^232.
Deconvolution and Inversion, pp. 321^337, eds Bernabini, M., Varela, C.L., Sto¡a, P.L. & Sen, M.K., 1996. Automatic background
Carrion, P., Jacovetti, G., Rocca, F., Treitel, S. & Worthington, M., velocity estimation in 2D media, 58th Mtg. Eur. Assoc. Expl
Blackwell Scienti¢c Publications, Oxford. Geophys., Extended AbstractsöGeophysical Division EAEG,
Mora, P., 1989. Inversion~migrationztomography, Geophysics, 54, session X009.
1575^1586. Wang, Y. & Housemann, G.A., 1995. Tomographic inversion of
Oristaglio, M.L. & Worthington, M.H., 1980. Inversion of surface re£ection seismic amplitude data for velocity variation, Geophys. J.
and borehole electromagnetic data for two-dimensional electrical Int., 123, 355^372.
conductivity models, Geophys. Prospect., 28, 633^657. Williamson, P.R., 1990. Tomographic inversion in re£ection
Ory, J. & Pratt, R.G., 1995. Are our parameter estimates biased? The seismology, Geophys. J. Int., 100, 255^274.
signi¢cance of ¢nite di¡erence operators, Inverse Problems, 11, Wu, R.S. & Toksoz, M.N., 1987. Di¡raction tomography and
397^424. multisource holography applied to seismic imaging, Geophysics, 52,
Pratt, R.G., 1990. Inverse theory applied to multi-source cross-hole 11^25.
tomography. Part II: elastic wave-equation method, Geophys. Xu, T., McMechan, G.A. & Sun, R., 1995. 3-D prestack full-wave¢eld
Prospect., 38, 311^330. inversion, Geophysics, 60, 1805^1818.
Pratt, R.G. & Goulty, N.R., 1991. Combining wave-equation imaging Zhou, C., Cai, W., Luo, Y., Schuster, G.T. & Hassanzadeh, S., 1995.
with traveltime tomography to form high-resolution images from Acoustic wave-equation traveltime and waveform inversion of
crosshole data, Geophysics, 56, 208^224. crosshole seismic data, Geophysics, 60, 765^773.
Pratt, R.G. & Worthington, M.H., 1990. Inverse theory applied to Zienkiewicz, O.C. & Taylor, R.L., 1989. The Finite Element Method,
multi-source cross-hole tomography. Part I: acoustic wave-equation 4th edn, McGraw-Hill, London.
method, Geophys. Prospect., 38, 287^310.
Pratt, R.G., McGaughey, W.G. & Chapman, C.H., 1993. Anisotropic
velocity tomography: a case study in a near-surface rock mass, A PPE N D I X A : A LT E RNAT I V E
Geophysics, 58, 1748^1763. PA R A MET R I Z AT I O N
Pratt, R.G., Shipp, R.M., Song, Z.M. & Williamson, P.R., 1995. Fault
delineation by wave¢eld inversion of cross-borehole seismic data, In this appendix we clarify the relationship between the
57th Mtg. Eur. Assoc. Expl Geophys., Extended Abstracts EAEG, 95, variables used to parametrize the inverse problem, and the
Session D001. variables used to compute the elements of the complex
Press, W.H., Teukolsky, S.A., Vettering, W.T. & Flannery, B.P., 1992. impedance matrix, S. In the ¢nite-di¡erence method, the
Numerical Recipes in FORTRAN: The Art of Scienti¢c Computing, elements of S are computed using standard schemes from a
2nd edn, Cambridge University Press, Cambridge.
knowledge of the model at a discrete set of l nodal points.
Rodi, W.L., 1976. A technique for improving the accuracy of ¢nite
element solutions for magnetotelluric data, Geophys. J. R. astr. However, it is not necessary or even desirable to use these
Soc., 44, 483^506. nodal parameters as the inversion parameters. We shall
Santosa, F., 1987. Inversion of band-limited re£ection seismo- represent the nodal parameters using the q.l|1 column vector
grams using stacking velocities as constraints, Inverse Problems, 3, m, from which the impedance matrix is computed directly. The
477^499. number q represents the number of di¡erent physical para-
Sen, M.K. & Sto¡a, P.L., 1991. Nonlinear one-dimensional seismic meters required at each node point (for example for a problem
waveform inversion using simulated annealing, Geophysics, 56, parametrized in terms of velocity and density, q~2). We shall
1624^1638. represent the inversion parameters using the m|1 column
Shin, C.S., 1988. Nonlinear elastic wave inversion by blocky para- vector p (m is independent of l). In order to compute a forward
meterization, PhD thesis, University of Tulsa, OK.
result from a given inversion result, we require a method for
Snieder, R., 1990. A perturbative analysis of non-linear inversion,
Geophys. J. Int., 101, 545^556. computing the elements of S from a given p. It is also necessary
Snieder, R., Xie, M.Y., Pica, A. & Tarantola, A., 1989. Retrieving to be able to compute the gradient of the mis¢t function with
both the impedance contrast and background velocity: a global respect to these parameters, and, for the Hessian matrix, to be
strategy for the seismic re£ection problem, Geophysics, 54, able to compute the partial derivatives of the data with respect
991^1000. to these parameters.
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:17:11 3B2 version 5.20

If one chooses inversion parameters that can be used to where u is the predicted (forward-propagated) wave¢eld in
compute modelling variables through linear combinations of the current model, at each of the l node points. Let us use
basis (or interpolation) functions, then further simpli¢cation of eqs (A2) and (A3) to obtain a relationship between the required
the quantities required in the FDI problem is possible. For virtual source vectors and the virtual source vectors for a point
continuous functions m(r), such a scheme can be represented collocation scheme. We have
by " #
LS X
q|l
LS . Lmj
(i)
X
m f ~{ u ~{ u
m(r)~ a(i) (r)pi , (A1) Lpi j
Lmj Lpj
i " #
X
q|l
LS . (i)
where {a(i) (r)} is a set of m basis functions (the interpolation ~{ a u
functions), and pi are the components of the vector p. Eq. (A1) j
Lmj j
yields a continuous expression for the model parameter m(r). (A4)
X
q|l
LS . (i)
However, in order to generate the impedance matrix S we only ~{ u aj
require the values of the model parameters at the nodal points. j
Lmj
Evaluating (A1) at the l nodal point locations {ri }, we obtain
X
q|l
X
m ~ g(j) a(i)
j
m~ a(i) pi ~Ap , (A2) j
i
(no implied summation), where we have introduced the virtual
where m is the q|l vector of modelling parameters, a(i) are the
sources under the point collocation scheme,
values of the basis functions at the nodal points (the basis
vectors), and A is a (real-valued) projection matrix whose LS
g(j) ~{ u, (A5)
columns comprise these basis vectors. Lmj
Many examples of this class of parametrization have been which are trivial to compute from the ¢nite-di¡erence
suggested. For example, this subsumes the use of the modelling formulation (or from the ¢nite-element formulation). From
parameters themselves as inversion parameters (i.e. p~m). eq. (A4) we may conclude that the virtual source matrix is
This scheme is referred to as the `point collocation' scheme, for given by
which the basis functions are spatial delta functions and for
which there is a one-to-one mapping of the elements of p to the F~GA , (A6)
elements of m. This choice is the most obvious one. It has been where G is the virtual source matrix for the point collocation
used frequently in waveform inversion (e.g. Mora 1987, but see scheme, the columns of which comprise the q|l virtual source
Xu et al. 1995 for a counter-example). However, the degree vectors g(i) (one for each of q physical parameters at each of l
of discretization required to model accurately the forward nodes). Eqs (A4) and (A5) show that the virtual sources
problem (multiple node points per wavelength) greatly exceeds required for the scheme (A2) may be obtained by interpolating
the resolution of most geophysical measurement techniques. the virtual sources for the point collocation scheme.
Using the point collocation scheme is an extreme choice that From eqs (19) and (A6) we can generate expressions for the
is wasteful of computing resources and leads to unnecessary gradient vector,
non-uniquenesses for the inverse problem. This is especially
important when the early stages of a non-linear inverse scheme +p E~Re{Ft v}~Re{At Gt v}~At Re{+m E} , (A7)
are only used to solve for the large-scale (smooth) parameters and for the approximate Hessian,
[as advocated by Williamson (1990) and Bunks et al. (1995)].
At the other extreme, one may choose to solve only for a Ha ~Re{Jt J}~At Re{Kt K}A (A8)
single parameter, say the average velocity (in which case the
[which is obvious on consideration of eqs (17 and (A6)], where
interpolation function is simply a constant). In between these
K is the Frechët derivative matrix for the point collocation
extremes are a variety of possible schemes, including cubic
scheme. The equations show that the gradient vector and
splines (as in Farra & LeBegat 1995), Chebyshev polynomials
the approximate Hessian matrix can be obtained from the
(as in Grechka & McMechan 1997), truncated Fourier series
equivalent quantities for the point collocation scheme,
(as in Wang & Houseman 1995), or bilinear interpolation of
using the matrix At as a projection matrix from the space of
parameters on a coarse grid (as in Pratt, McGaughey &
modelling parameters to the space of inversion parameters.
Chapman 1993). All of these schemes can be expressed in the
form (A1).
Eq. (A2) allows us to generate the impedance matrix S from Relationship to other schemes
the current inversion parameters in two stages: ¢rst, compute
the nodal parameters, m, from the inversion parameters, p, The relationships between the point collocation scheme and
using (A2), then generate the impedance matrix using the model parametrizations given by eqs (A2), (A7) and (A8) can
appropriate ¢nite-di¡erence scheme. In order to compute the be directly related to the subspace search method developed by
elements of the Frechët derivative matrix, J, we observe from Kennett et al. (1988). If we express the modelling-parameter
eq. (17) in the body of the paper that we require the virtual updates at each iteration for the Gauss^Newton method
source matrix, F, the columns of which comprise the m virtual by inserting eqs (A7) and (A8) into eq. (35) in the body of
source vectors f (i) (one for each inversion parameter), given by this paper, and using eq. (A2) to project onto the modelling
parameters, we obtain
LS
f (i) ~{ u, (A3)
Lpi dm~A(At Re{Kt K}AzjI){1 At Re{Kt dd} , (A9)
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:17:19 3B2 version 5.20

which is identical to eq. (2.16) of Kennett et al. (1988), except perturbations to the data perturbation
that we are working with complex-valued data, and that
we have again omitted a rigorous consideration of the role of dd(e)~F (edp)~edd(1) ze2 dd(2) z . . .
the a priori model covariance matrix. We reiterate that the
~eF (1) dpze2 dpt F (2) dpz . . . (B1)
covariances have been omitted for reasons of clarity, and that
these could be incorporated in a straightforward manner. (note that the vector dd used here is not de¢ned in exactly the
More recently, Bunks et al. (1995) have advocated a `multi- same manner as the same vector in the body of the paper,
scale' approach to seismic-waveform inversion, in which the although the two quantities are analogous). In eq. (B1),
large-scale parameters of the problem are solved early in an
iterative scheme, in order to attempt to obtain a `reasonable dd(1) ~F (1) dp~Jdp (B2)
degree of convergence to the neighbourhood of the global
and
minimum [of the mis¢t function]'. They advocate decomposing
the data by (temporal) scale (i.e. frequency) and decomposing dd(2) ~dpt F (2) dp . (B3)
the model by (spatial) scale (i.e. wavenumber). The long-
scale, low-frequency components of the data are used in the The second data perturbation vector, dd(2) , is de¢ned in terms
initial iterations to locate the long-scale, large-wavenumber of its components
components of the model. Proceeding from low-frequency
components of the data to the higher frequencies has also been ddi(2) ~dpt J(2)
i dp , (B4)
advocated by Pratt & Goulty (1991) and Song et al. (1995). The
with
strategy of proceeding from the large wavenumbers of the
model to the small wavelengths is comparable to the approach L2 ui
taken by Williamson (1990). J(2)
i ~ . (B5)
Lpt Lp
The methods we have developed in this appendix are exactly
what are required for a formal multiscale approach. The In these equations, J is the Frechët derivative matrix intro-
decomposition of the data by frequency is inherent to the FDI duced in eq. (11) and J(2) i is the matrix of second partial
approachöit is entirely natural that we would attempt to derivatives of the ith data component. The Taylor expansion of
use the low frequencies early in the iterative solution. A the forward problem above is commonly referred to as the
decomposition by wavenumber could be implemented by using Born series; the ¢rst term is the ¢rst Born approximation, the
a small number of basis vectors in the description of the second term the second Born approximation and so on.
inversion parameters, representing only the long wavenumbers The inverse problem is a non-linear mapping from the data
of the model. In the language of the multigrid methods to the estimated parameters, dpª (e):
described by Bunks et al. (1995), we require a restriction
operator that restricts a given model to its large-scale dpª (e)~I(dd(e))~dpª (1) zdpª (2) z . . .
components, and an injection operator that interpolates a
~I (1) ddzddt I (2) ddz . . . , (B6)
large-scale solution to smaller scale lengths. Eq. (A9) can be
interpreted as the following set of operations: in which the terms in the Taylor expansion involve the
(1) obtain the gradient for the point collocation scheme, operators I (1) , I (2) , . . . , yet to be determined.
+m E; If we wish the estimated parameters dpª (e) to reproduce the
(2) use the projection matrix At as a restriction operator to ideal model, edp, as accurately as possible, then we may assume
restrict the gradient +m E to its large-scale components; the equality dpª (e)?edp (see Snieder 1990 for a more rigorous
(3) use the restricted gradient and the approximate inverse discussion of why this approach is valid). Following this, we
Hessian as a `relaxation operator' to ¢nd a model update at insert eq. (B1) into eq. (B6) and equate equal powers of e to
these large scale lengths; obtain
(4) ¢nally, project the large-scale solution onto the (small-
dp~I (1) Jdp (B7)
scale) model space m using the projection matrix A as the
injection operator.
0~I (1) dpt F (2) dpzdpt Jt I (2) Jdp (B8)
..
APPE N D IX B : C OM PA R I S O N W I T H .
S N I E D E R (19 9 0)
Eq. (B7) is satis¢ed when I (1) is the standard (Gauss^Newton)
In this appendix we compare the formula given in this paper least-squares estimator for the linearized problem, that is when
for the non-linear term in the inverse Hessian with the methods
dpª (1) ~I (1) dd~H{1 t
developed by Snieder (1990) for incorporating non-linear a Re{J dd } , (B9)
e¡ects into iterative non-linear inverse methods. For the
in which
purposes of this comparison we follow the steps used by
Snieder, but we use the notation, and the matrix methods, Ha ~Re{Jt J} (B10)
developed in this paper, to obtain an explicit formula for the
¢rst-order multiple correction term. is the approximate Hessian (i.e. the ¢rst term in eq. 32).
Forward modelling is non-linear in the parameters p. Thus, Eq. (B8), obtained by equating terms in e2 , can be rewritten
if we perturb the parameters by an amount edp, the forward using (1) the substitution from the ¢rst Born approximation,
problem constitutes a non-linear mapping from the parameter eJdp&dd, and (2) the ¢rst term in the inverse problem
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:17:45 3B2 version 5.20

expansion, edp&dpª (1) : If eq. (B15) is expressed in terms of the components of the
(2) t (2) (1) (1)t (2) (1)
resulting parameter estimates, d^ pi , we obtain the following
dpª ~dd I dd~{I dpª F dpª , (B11) expression for the second-order corrections to the para-
in which the left-hand side is the next term in the inverse series, meter estimates (making use of the implied summation
which we can now compute, given the operator F (2) de¢ned in convention):
eqs (B3) and (B4). ( )
2
Substituting eq. (B11) into the inverse problem expansion (2) {1 (1) L ul (1)
pi ~[{Ha ]ik Re J lm d^
d^ pm d^
p (Newton) (B16)
(B6), we obtain an explicit expression for the ¢rst two terms of Lpk Lpj j
the non-linear inverse mapping:
t (where we have substituted the predicted data from the
dpª ~dpª (1) {H{1
a dpª
(1) (2)
F dpª (1) linearized inversion, dd&J dpª (1) , into eq. (B15). This form is
~dpª (1) {H{1 t (2) } (B12) remarkably similar to the expression we obtain from eq. (B12),
a Re{J ddê
Snieder's algorithm:
(2)
~dpª (1) {H{1 T
a Re{J ddê } ( )
2
(2) {1 (1) L ul (1)
[the superscript T represents the Hermitian (conjugate) trans- d^pi ~[{Ha ]ik Re Jkl d^ pm d^
p (Snieder) (B17)
Lpm Lpj j
pose, and we are thus conjugating both terms inside the
(2)
brackets], where the ith component of the vector ddê is given
by (again making use of the implied summation convention). The
two approaches, eqs (B16) and (B17), contain exactly the same
(2) t
ddê i ~dpª (1) J(2)
i dpª
(1)
. (B13) terms, but di¡er in the order of summation applied.
The essential similarities and di¡erences between the two
The linearized inverse dpª (1) is de¢ned in eq. (B9). The second approaches are summarized as follows.
term in eq. (B12) corrects the linearized inverse by subtracting
the estimated non-linear components from the inverse. The (1) Both equations use the approximate inverse Hessian
second-order algorithm for non-linear FDI thus proceeds as H{1
a to refocus the correction term.
follows: (2) Eq. (B16) predicts the correction by correlating the
second-order partial derivatives with the (true) data residuals
(1) form the ¢rst-order (linearized) inverse dpª (1) using (one correlation is obtained from each possible pair of nodes in
eq. (B9); the model). For each node in the image, the correlations for
(2) compute the second (forward) Born approximation from this node with all others are multiplied with the linearized
the linearized inverse to estimate the ¢rst-order multiples using parameter estimates and summed over all nodes.
eq. (B13); (3) Eq. (B17) seeks the same corrections by ¢rst pre-
(3) using eq. (B12), apply the ¢rst-order inverse operator to dicting only the non-linear e¡ects in the data, and then
the estimated multiples, and ¢nally subtract the result from using the linearized inverse to estimate the e¡ects on the model
dpª (1) . estimates.
At no time need the forward model be updatedöall quantities (4) Eq. (B16) contains only the ¢rst two terms in the inverse
are computed in the original background model. series for the Hessian matrix. We thus expect the full inverse,
It is interesting to try to manipulate the full Newton as described in the body of the paper to be a more e¡ective
inversion formula (see eq. 41 in the body of the paper) to obtain operator than the one embodied in eq. (B16). In contrast, in
a form similar to eq. (B12): developing eq. (B17) the higher-order terms were explicitly
neglected.
[Ha zR]dp~Re{Jt dd} (5) Computing the second-order partial derivatives in
[IzH{1 {1 t (1) eq. (B17) explicitly would require m2 computations. However,
a R]dp~Ha Re{J dd }~dpª
(B14) we can use a simple forward-propagation algorithm to
dp~[IzH{1
a R]
{1
dpª (1) (2)
compute the action of this operator so that the vector ddê in
dp~[I{H{1 {1 2
a Rz{Ha R} { . . .]dpª
(1)
. eq. (B13) can be computed using
If we retain only the ¢rst two terms in the series expansion and (2)
ddê ~{S{1 f (ij) dpi dpj (B18)
substitute the expression for R given in eq. (34) in the body of
the paper, we obtain an approximate version of the full Newton (summation is implied over the indices i and j), where f (ij) are
algorithm: the second-order virtual sources de¢ned in eq. (47). As in the
computation of R by backpropagation, this computation
L t dd . . . dd) dpª (1) .
dpª ~dpª 1 {H{1
a Re J (dd also requires only m additional forward problems to be solved
Lpt
(still in the same model) to obtain the second-order virtual
(B15) sources.
ß 1998 RAS, GJI 133, 341^362
GJI000 21/4/98 09:18:10 3B2 version 5.20


Gauss Newton and Full Newton Methods in Frequency Space Seismic Waveform Inversion

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gauss Newton and Full Newton Methods in Frequency Space Seismic Waveform Inversion

Uploaded by

Copyright:

Available Formats

Geophys. J. Int.

(1998) 133, 341^362

Gauss^Newton and full Newton methods in frequency^space

R. Gerhard Pratt,1, * Changsoo Shin2 and G. J. Hicks3

numerical di¡erence scheme via a ¢nite-di¡erence forward-

ß 1998 RAS 341

GJI000 21/4/98 09:10:55 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:11:33 3B2 version 5.20

where C is the damping matrix. Details of the ¢nite-element

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:11:42 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:12:16 3B2 version 5.20

LS v~[S{1 ]t ddê (20)

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:12:40 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:13:15 3B2 version 5.20

work of Hagedoorn (1954) in his discussion of maximum con-

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:13:39 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:14:10 3B2 version 5.20

We can explicitly express each element of the Hessian matrix

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:14:15 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:14:47 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:15:11 3B2 version 5.20

dp~{(Ha zR){1 +p E , (41)

Calculation of the exact Hessian

where we have introduced a new, second-order virtual source

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:15:24 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:15:58 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:16:22 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:16:27 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:16:58 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:17:03 3B2 version 5.20

Gauss^Newton method thus requires exactly the same work

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:17:07 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:17:11 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:17:19 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:17:45 3B2 version 5.20

ß 1998 RAS, GJI 133, 341^362

GJI000 21/4/98 09:18:10 3B2 version 5.20

You might also like