Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Introduction to Deconvolution and Inversion

The convolutional seismic trace model

The geological sequence, the ρV -log


and the seismic record.

Anstey
Fig.8
The convolutional seismic trace model

The development of the seismic reflection waveform from the geologic sequence.
Anstey
Fig. 51
Acoustic impedances and reflection coefficients and inversion
Discrete Inversion layer i
ρ i ,Vi
ρ i +1Vi +1 − ρ iVi Z i +1 − Z i
ri = = interface i
ρ i +1Vi +1 + ρ iVi Z i +1 + Z i ρ i +1 , Vi +1
layer i + 1
2 Z i +1
1 + ri =
Z i +1 + Z i
2Z i
1 − ri =
Z i +1 + Z i
Z i +1 1 + ri 1 + ri n −1
1 + ri 
= ⇒ Z i +1 = Z i ⇒ Z n = Zi ∏  
Z i 1 − ri 1 − ri i =1  1 − ri 

Continuous Inversion
Z (t + dt ) − Z (t ) 1 dZ (t ) 1 d ln Z (t )
r (t ) =
Note the time variable!
= =
Z (t + dt ) + Z (t ) 2 Z (t ) 2 dt
t
Z (t ) = Z (0 ) exp 2∫ r (τ )dτ valid if r (t ) 〈 0.3

1 dZ (t )
0 t
r (t ) = ⇔ Z (t ) = 2 Z (0 )∫ r (τ )dτ valid if r (t ) ≈ 0.1
2 Z (t ) 0
SEG Inversion
Convolution of the functions f (t ) and g (t )

f (t )

→t
g (t )

→t

Convolution:
∞ ∞
h(t ) = f (t ) ⊗ g (t ) = ∫ f (τ )g (t − τ )dτ = ∫ f (t − τ )g (τ )dτ = g (t ) ⊗ f (t )
−∞ −∞
Convolution of the functions f (t ) and g (t )

f (τ )

→τ
g (t − τ )

→τ
t

∞ ∞
convolution in time: h(t ) = f (t ) ⊗ g (t ) = f (τ )g (t − τ )dτ = ∫ f (t − τ )g (τ )dτ

−∞ −∞
c
multiplication in frequency: H (ν ) = F (ν ) ⋅ G (ν ) = G (ν ) ⋅ F (ν )
Convolution in the discrete (=sampled) time domain ⇔ multiplication
in the Z-domain

Convolution of sampled data:


at = (a0 , a1 , L , an )
bt = (b0 , b1 , L , bm )
m n
ci = ai ⊗ bi = ∑ ai − k bk = ∑ ak bi − k
k =0 k =0

Multiplication of Z-transforms:
(
A( z ) = a0 + a1 z + L + an z n )
B ( z ) = (b
0 + b1 z + L + bm z m )
(
C ( z ) = A(Z ) ⋅ B(Z ) = c0 + c1 z + L + cm + n z m + n )
Convolution in matrix notation
at = (a0 , a1 ,L , an )
bt = (b0 , b1 ,L , bm )
m n
ci = ai ⊗ bi = ∑ ai − k bk = ∑ ak bi − k
k =0 k =0

(m + 1) columns (n + 1) columns
 a0 0 0  a0b0   b0 0 0  b0 a0 
a    b   
a0  a b + a b   1 b0   b a + b a 
 1  1 0 0 1 1 0 0 1

 O O   b0     O O   a0   
           
O O 0   b1     O O 0   a1   

 an O a0  ⋅   =  anb0 + L + a0bn  or bm O b0  ⋅   =  bm a0 + L + b0 am 

       
0 an a1      0 bm b1     
  b      a   
O
  n 
O  m   
  
 O   anbm −1 + an −1bm   O   bm an −1 + bm −1an 
  0  
0
 0 an   a b
n m   0 bm   b a
m n 

(n + m + 1, m + 1) (n + m + 1) (m + n + 1, n + 1) (m + n + 1)
r r r r
[A].b = c [B].a = c
The convolutional seismic trace model
additive
noise

n(t )

“earth response”

r (t ) w(t ) ⊕ x(t )

reflectivity wavelet measured data = seismic trace


trace

x(t ) = w
{
(t ) ⊗ r (t ) + n (t ) = s (t ) + n (t )
signal noise

The purpose of deconvolution is to reconstruct r(t) from x(t).


Deconvolution problems

• the convolutional model may be inappropriate


(e.g. source with strong directivity pattern)
• wavelet w(t) not known
• disturbance of w(t) by for example ghost effects
• time-variancy of w(t) (e.g. absorption)
• lateral variancy of w(t)
• presence of noise
• s(t) is disturbed by for example multiples
• frequency band limitations
• preprocessing effects
• the validity of the necessary assumptions
Deconvolution principle

suppose:
s(t ) = w(t ) ⊗ r (t )
S ( z ) = W (z ) ⋅ R( z )

purpose:

estimate H ( z ) =
1
W (z )
(z ) ⋅ S (z ) = R(z )
and apply this filter to the data: H {

⋅W (z ) ⋅ R(z )
1
W (z ) r (t ) = reflectivity
data-adaptive:
w(t) is estimated from the data
deconvolution

deterministic:
w(t) is supposed to be known
Application of the deconvolution process

• determine w(t)
• determine the inverse of w(t) h(t)
• convolve the data with h(t)

properties:
• w(t) has to be known/estimated
• h(t) may be very long
• h(t) will not always exist
• noise may dominate the results in areas with a small S/N-ratio
Definition of inverse filter or spiking filter or whitening filter
Given a time function w(t ) , then the inverse time function h(t ) = w−1 (t ) or inverse
filter or spiking filter or whitening filter is defined by:

h(t ) ⊗ w(t ) = δ (t ) time-continuous domain

H (ν ) ⋅W (ν ) = 1 frequency domain

h∆ (t ) ⊗ w∆ (t ) = (L,0,1,0, L) discerete-time domain

H (z ) ⋅W (z ) = 1 Z-domain

Hence:
H (ν ) =
1
W (ν )
H (z ) =
1
W (z )
Least-squares filtering: Wiener filtering
input filter actual output

at ft ct = f t ⊗ at


_
+ et = d t − ct
error

dt
desired output

0 at n

m 0
ft
0 ct = f t ⊗ at n+m

0 dt n+m

Objective: design the filter such that ∑ et2 = ∑ (d t − ct )2 is minimized.


t t
Least-squares filtering
input: at = a0 , L, an 0 at n

filter: f t = f 0 ,L, f m m
ft 0
0 ct = f t ⊗ at n+m
actual output: ct = c0 , L, cn + m
dt n+m
desired output: d t = d 0 , L , d n + m 0

m+ n m+ n m+ n 2
 m

Objective is the minimization of E = E ( f 0 ,L, f m ) = ∑ et2 = ∑ (d t − ct ) = ∑ d t − ∑ f s at − s 
2

t =0 t =0 t =0  s =0 
Minimization of E leads to the normal equations:
∂E m+ n m+n
= 2 ∑ d t − ∑ f s at − s  ⋅ (− at − j ) = 2 ∑ et ⋅ (− at − j ) = 0 for
 m

j = 0, L , m
∂f j t =0  s =0  t =0
orthogonal
This set of equations can be written as:
autocorrelation of input
 r0 r1 L rm   f 0   g 0  m+ n
r r O M   f   g  r r rj − s = ∑ at − s ⋅ at − j = rs − j
1 0  ⋅  1  =  1  or [R]⋅ f = g with t =0
 M O O r1   M   M  m+ n
      g j = ∑ d t ⋅ at − j
rm L r1 r0   f m   g mr t =0
T
r
A A f AT d crosscorr. of input and desired output
Least-squares filtering
input: at = a0 , L, an
filter: f t = f 0 ,L, f m
actual output: ct = c0 , L, cn + m
desired output: d t = d 0 , L , d n + m
m+ n m+ n m+n 2
 m

Objective is the minimization of: E = E ( f 0 ,L, f m ) = ∑ e = ∑ (d t − ct ) = ∑ d t − ∑ f s at − s 
2 2
t
t =0 t =0 t =0  s =0 
rT r r
( ) ( )
rT r r
E = e ⋅ e = d − [A] f ⋅ d − [A] f
(m + 1) columns
 a0 0 L 0 
a a O 
The solution of this minimization can be formulated as:  1 0 
 O 0 
 
 a0 
( )
r r
f = [A] [A] [A] ⋅ d [A] = 
−1
T T
with 
 
 n
a a n −1 
0 an O 
 
[A]T ⋅ [A] = [R] is the autocorrelation of the input signal M O O a n −1 
r 0 L 0 a n 
[A] ⋅ d is the crosscorrelation of the input signal and the desired output signal
T

Linear least-squares estimation – expression for the error

r
d r r r
e = d − Af

r r
c = Af

(i.e. no cross terms)

( ) ( )
rT r r r T r r rT r rT r rT r rT r rT r rT r
E = e ⋅e = d − c ⋅ d − c = d ⋅d − d ⋅c − c ⋅d + c ⋅c = d ⋅d − c ⋅c
c
r r r r
Af ⊥ e = d − c
(Weighted) linear least mean square (llms) filtering
n+m n+m 2
 m
 n+m 2
llms filtering E ( f 0 ,L, f m ) = ∑ [d t − ct ] = ∑ d t − ∑ f s at − s  = ∑ et
2

t =0 t =0  s =0  t =0

( ) ( )
r rT r r rT r
E ( f 0 , L, f m ) = d − [A] f ⋅ d − [ A] f = e ⋅ e
minimization of E leads r r
to the normal equations: [A] [A]⋅ f = [A] ⋅ d
T T

( )
r r
f = [A] [ A] ⋅ [ A] ⋅ d
T −1 T
equation for the filter:

weighted llms filtering 2


n+m n+m
 m
 n+m 2
E ( f 0 ,L , f m ) = ∑ wt [d t − ct ] = ∑ wt d t − ∑ f s at − s  = ∑ wt et
2

t =0 t =0  s =0  t =0

( ) ( )
r rT r r rT r
E = d − [A] f ⋅ [W ]⋅ d − [ A] f = e ⋅ [W ]⋅ e
minimization of E leads r r
to the normal equations: [A] [W ][A]⋅ f = [A] [W ]⋅ d
T T

( )
r r
f = [ A] [W ][A] ⋅ [A] [W ]⋅ d
T −1 T
equation for the filter:
Least-squares filtering in the frequency domain

signal: S (ν ) = AS (ν )e jϕ S (ν )

noise: N (ν ) = AN (ν )e jϕ N (ν )

filter: F (ν ) = AF (ν )e jϕ F (ν )

desired output: D(ν ) = AD (ν )e jϕ D (ν )


minimize: E{F (ν )} = ∫ [D (ν ) − F (ν )( S (ν ) + N (ν ))] ⋅ [D (ν ) − F (ν )(S (ν ) + N (ν ))]∗

−∞

assumption: the noise has random phase with a uniform distribution



E{F (ν )} = ∫ {A (A )
+ AN2 − 2 AF AS AD cos(ϕ F + ϕ S − ϕ D ) + AD2 dν }
2 2
F S
−∞

the filter that minimizes this expression has amplitude and phase spectrum:
ϕ F (ν ) = ϕ D (ν ) − ϕ S (ν )

AS (ν )AD (ν )
AF (ν ) =
AS2 (ν ) + AN2 (ν )
Least-squares filtering in the frequency domain with different objectives
AS (ν )AD (ν )
F (ν ) = AF (ν )e jϕ F (ν ) with AF (ν ) = and ϕ F (ν ) = ϕ D (ν ) − ϕ S (ν )
AS2 (ν ) + AN2 (ν )

(a) smoothing or signal-to-noise ratio enhancement


input data: X (ν ) = S (ν ) + N (ν )
desired output: D(ν ) = S (ν )
AS2 (ν )
AF (ν ) = 2
lls smoothing filter:
AS (ν ) + AN2 (ν )
ϕ F (ν ) = 0

(b) whitening or lls inverse filtering

input data: X (ν ) = S (ν ) + N (ν ) = W (ν ) ⊗ R(ν ) + N (ν )

desired output: D(ν ) = 1 or D(ν ) = R(ν )


AS (ν ) AW (ν )
AF (ν ) = AF (ν ) =
lls inverse filter: “white
AS2 (ν ) + AN2 (ν ) earth” AW2 (ν ) + AN2 (ν )
ϕ F (ν ) = −ϕ S (ν ) ϕ F (ν ) = −ϕW (ν )
Estimation of the Signal Spectrum and Noise Spectrum
trace i trace j CMP

seismic section
t

xi (t ) = s (t ) + ni (t ) x j (t ) = s (t ) + n j (t )

1. The seismic trace is modeled as: xi (t ) = s (t ) + ni (t )

2. Calculate the autocorrelation of the traces: Rxi xi (t ) = Rss (t ) + Rsni (t ) + Rni s (t ) + Rni ni (t )
3. Calculate the crosscorrelation of pairs of traces: Rxi x j (t ) = Rss (t ) + Rsni (t ) + Rn j s (t ) + Rni n j (t )

4. The expectation values of the autocorrelations and crosscorrelations are:

{
{
}
E Rxi xi (t ) = Rss (t ) + Rnn (t )
}
E Rxi x j (t ) = Rss (t ) } ⇒ Rss (t ) and Rnn (t )
Data adaptive inverse filter design

Model of a trace: s (t ) = w(t ) ⊗ r (t ) + n(t )

Objective: design the inverse filter for the wavelet w(t )

Assumption 1: ignore the noise → s (t ) = w(t ) ⊗ r (t )

Assumption 2: the earth is “white” → Arr (t ) = cδ (t )


→ Ass (t ) = cAww (t )
Assumption 3: the wavelet is minimum-phase → Aww (t ) ⇒ wmin
−1
(t )
Pitfall: Routine application of spiking deconvolution

White Earth Reflectivity: Good Spiking Deconvolution


Earth’s Reflectivity Wavelet Input Seismic Trace

* =
Decon whitens input Output Wavelet Output Seismic Trace
trace to match earth’s
reflectivity

Non-White Earth Reflectivity: Bad Spiking Deconvolution


Earth’s Reflectivity Input Wavelet Input Seismic Trace

* =
Output Wavelet Output Seismic Trace
Decon whitens input
trace which distorts
earth’s reflectivity
Singular Value Decomposition: SVD Aki and Richards
r r 11 1m 1
P. 677-699
A x = dr  1  
T r  r  r
UΛV x = d 
 A . x=d
    
A T A = V Λ 2V T  m  
n
( A A)
T −1
= V Λ−2V T n1 nm  

( AA )uri = λi2 uri


data space
T 11 1m  1
11 1n    11  
    1  
( A T A )vri = λi2 vri 

U
U
⋅
 
Λ .

V T . xr  =  dr 
   
model space
    mm   m   
with i = 1, L , p  n1 nn   n
 n1 nm   
λ p +1 = L = λm = 0
data eigenvectors model eigenvectors

U T U = UU T
= I n ; V T V = VV T
= Im
r r
Λ p 0   V pT  =
= (U p , U 0 )
U u , L , u (n, p )
A = UΛV  T  r1 r
T p p

 = L (n , n − p )
0   V0  U 0 u p +1 , , u n
(n, m )  0  r
V p = v1 , L , v p
r
(m , p )
((n, p ), (n, n − p )) (n, m ) r r
V 0 = v p +1 , L , v m (m , m − p )
A = U p Λ pV pT
Singular Value Decomposition: SVD Aki and Richards

r r
P. 677-699
11 1m 1
A x = dr  1  
T r  r  r
UΛV x = d 
 A . x=d
    
A T A = V Λ 2V T  m  
n
( A A) T −1
= V Λ − 2V T
n1 nm  
11 1m  1
 
( AA )ui = λi ui
r T 2r 

 11

.
 11
.
1  
. xr  =  dr 
 Λ
( A A )v i = λi v i
r 2r
T
T

U
  V    
   mm  mm  m   
with i = 1, L , n n1 nm n
 
λm+1 = L = λn = 0

U T U = UU T = V T V = VV T = I m
11 1m
11   
1n    11  11  11  11  11 0 0 
         
 UT .  U  =  V T
.
 V =
  V .
 V T
=
  0 I m 0 
m1  
mn   
 mm  mm  mm  mm  0 0 mm
n1 nm
Singular Value Decomposition: SVD - Example
2 4
1 3
A = UΛV T A=
n,m n,n n,m m,m 0 0
 
0 0
20 14 0 0
 0.82   − 0.58  0 0
14 10 0 0        
AA = 
T  has eigenvalues λ12, 2,3, 4 ≈ 29.883,0.117,0,0 and eigenvectors  0.58   0.82 
 0 
0 0
 0 0 0 0  0  1 0
         
 0   0  0 1
 0 0 0 0        

8 20  0.40   − 0.91


AT A =   has eigenvalues λ12, 2 ≈ 29.883,0.117 and eigenvectors    
5 13   0.91  0.40 

2 4 0.82 − 0.58 0 0 5.47 0 


1 3 0.58 0.82  
0 0  0 0.37  0.40 0.91
A= ≈ ⋅ ⋅
0 0  0 0 1 0  0 0  − 0.91 0.40
     
0 0  0 0 0 1  0 0 

A = U Λ VT

A = Up Λp V pT
UU T = U T U = I 4 VV T = V T V = I 2
AAT = UΛV T VΛU T = UΛ2U T
AT A = VΛU T UΛV T = VΛ2V T Aki & Richards, Ch. 12
Singular Value Decomposition: SVD - Example
1 1 0 
Λ 0V pT 
A = 0 0 1  A = (U p , U 0 ) p   = U Λ V T

0V0T 
p p p
0 0 − 1 0
r 2r
AA ui = λi ui
T

r r
AT Avi = λ2i vi
2 0 0  2 − λ2 0 0
AAT = 0 1 − 1 0 1 − λ2 ( )(
− 1 = 2 − λ2 λ2 − 2 λ2 = 0) ⇒ λ12 = λ22 = 2 ; λ3 = 0
2

0 − 1 1  0 −1 1 − λ2

r
1 1 0    0 1  0    2 0 0   0 0 1  v1
           r
A = 0 0 1  =   1 2 0,1 2    0 2 0 1 2 1 2 0  v2
r
0 0 − 1  − 1 2 0 1 2    0 0 0 1 2 − 1 2 0  v3

r r r
u1 u2 u3

1 1 0   0 1
     2 0  0 0 1
A = 0 0 1  =  1 2 0    
2  1 2 1 2 0
0 0 − 1 − 1 2 0 
0 Aki &Richards
P. 682,683
Singular Value Decomposition: SVD - Example
r rT r
X = U ⋅ Λ ⋅ V = ∑ λi ui vi = X 1 + X 2 + X 3
T

i =1 eigenimages

2 4 0.82 − 0.58 0 0 5.47 0 


1 3 0.58 0.82 0 0  0 0.37  0.40 0.91
X = ≈ ⋅ ⋅
0 0  0 0 1 0  0 0  − 0.91 0.40
     
0 0  0 0 0 1  0 0 
1.7942 4.0817  0.1953 − 0.0858
X ≈  + − 0.2761 0.1214 
 1. 269 2 . 8871  

1 1 0    0 1  0    2 0 0  0 0 1 
       
   
X = 0 0 1  =   1 2 0,1 2    0 2 0 1 2 1 2 0 
0 0 − 1  − 1 2 0 1 2    0 0 0 1 2 − 1 2 0 
 0 0 0  1 1 0   0 0 0 
X = 0 0 1  + 0 0 0 + 0 0 0
0 0 − 1 0 0 0 0 0 0
Least-squares solution expressed in terms of SVD
r r
Ax = d
with A = U ⋅ Λ ⋅ V T = U p ⋅ Λ p ⋅ V pT
−1
A = A A
L ( T
)
−1
AT = VΛ−1U T = V p Λ−p1U Tp

1. Unconstrained least-squares
−1 T r r r
r r
min A x − d
2 r
( )−1 −1 T
⇒ x = A A A .d = AL .d = V p Λ pU p ⋅ d
T

( )
r 1 r rT r 1 r rT r
x = v1 u1 d + v2 u2 d + L +
λ1 λ2
( )
1 r rT r
λp
vp up d ( )
r α1 r α 2 r αp r
x = v1 + v2 + L + vp
λ1 λ2 λp
2. Marquardt-Levenberg method; constrained least-squares with damping factor β
 r r
min A x − d

2 r 2
+β x

2
[ −1 T
]
⇒ x = V (Λ + βI ) ⋅ Λ U ⋅ d
r r

r  λj  T r
x = V diag  2 U ⋅ d
λ +β 
 j 
Least-squares solution expressed in terms of SVD
−1 T r
( ) ( )
r rT r
x = (A A) A ⋅ d = AL ⋅ d
r −1 r r
Ax − d ⋅ Ax − d
T
minimizes
with A = U ⋅ Λ ⋅ V
T

A −1
L = ( A A) A = VΛ U
T −1 T −1 T
or, instead [(
V ⋅ Λ + βI
2
)
−1
]
⋅ Λ ⋅U T

r −1
r −1 T
r
x = AL d = VΛ U d
or
( )
r 1 r rT r 1 r rT r
( )
x = v1 u1 d + v2 u2 d + L +
λ1 λ2
1 r rT r
vm u m d
λm
( )
r α1 r α 2 r αm r
x = v1 + v2 + L + vm
λ1 λ2 λm

r
x=
λ +β
2
λ1
( )
r rT r
v1 u1 d +
λ2
λ +β
2
( )
r rT r
v2 u 2 d + L +
λ +β
λm
2
( )
r rT r
vm u m d
1 2 m
Least-squares solution expressed in terms of SVD
r r
Ax = d
with A = U ⋅ Λ ⋅ V T = U p ⋅ Λ p ⋅ V pT
−1
A = A A
L ( T
)
−1
AT = VΛ−1U T = V p Λ−p1U Tp

1. Unconstrained least-squares
r r −1 T r r r
r
( )
2
min A x − d −1 −1 T
⇒ x = A A A .d = AL .d = V p Λ pU p ⋅ d
T

( )
r 1 r rT r 1 r rT r
x = v1 u1 d + v2 u2 d + L +
λ1 λ2
( )
1 r rT r
λp
vp up d ( )
r α1 r α 2 r αp r
x = v1 + v2 + L + vp
λ1 λ2 λp
2. Marquardt-Levenberg method; constrained least-squares with damping factor β
 r r
min A x − d

2 r 2
+β x

2
[ −1 T
]
⇒ x = V (Λ + βI ) ⋅ Λ U ⋅ d
r r

r  λj  T r
x = Vdiag  2 U ⋅ d
λ +β 
 j 
Resolution matrix and Covariance matrix
r r
The problem: Gm = d
r −1
r r r
A particular solution: m p = G p d
−1
m p = G p Gm

The resolution matrix: G p−1G


which expresses the particular solution as a weighted average of the true solution with
weights given by the row vectors of matrix G p−1G .
If G p−1G is the identity matrix I , resolution is perfect and the particular solution is
equal to the true solution. If the row vectors of G p−1G have components spread around
the diagonal (with low values elsewhere), the particular solution represents a smoothed
solution.
r
The covariance matrix for ∆m p in terms of the covariance matrix of the error in data:
r ~ −1
r ~ ~ −1 (~ means conjugate transpose)
∆m p ∆m p = G p ∆d∆d G p ( means averaging)

( )
r ~ ~ −1 2 −1 ~ −1 2 ~ −1
= G p ∆d∆d G p → σ d G p G p = σ d G p G p
−1

Thus once the operator G p−1 for a particular solution is known, the resolution and the A&R
error in the solution are easily obtained. P. 677
Resolution matrix and Covariance matrix
 11 L 1M   d1 
 M M   1  M 
 m 
r r ~ r r   ⋅ M  =  
 M 
The problem: G ⋅ m = d → UΛV ⋅ m = d  M M   
   mM   
 N1 L NM   dN 

−1
( )
~ −1 ~ −1 ~
The generalized inverse matrix: G = GG G = VM Λ M U M
g U 0 exists
V0 does not exist;
M 〈N
r −1
r −1 ~
r
A particular solution: m p = G p ⋅ d = V p Λ p U p ⋅ d with p≤M

−1 ~
The resolution matrix: G G = V pV p
p

2 −1 ~ −1 −2 ~
The covariance matrix: σ d G p G p = σ dV p Λ p V p
2

( )( )
rT ~ T ~ r
Minimum energy: Emin = d U pU p − I U pU p − I d
A&R
P. 677
Singular Value Decomposition (SVD)

1. Start with the linear equation:


r r M
d = Gm or d i = ∑ Gij m j with i = 1, L , N and N〉 M
G = (G )
N ,1 N, M M ,1 j =1
~ ∗ T
complex conjugate
2. A particular solution is:
r −1
r r −1 r
mp = Gp d → m p = G p Gm averaging
the particular solution is a weighted average of the true solution

3. G p−1G is the resolution matrix; G p−1G = I means, perfect resolution

r r
4. The error ∆m p due to the error ∆d in the data is described by the covariance matrix:
r ~ −1
r ~ ~ −1 −1
r ~ ~ −1
∆m p ∆m p = G p ∆d∆d G p = G p ∆d∆d G p

Thus, once G p−1 for a particular solution is known, the resolution and error in the solution
are easily obtained.
Aki and Richards
P. 677-699
Singular Value Decomposition (SVD)
r r r r
Gm = d with m = (m1 , L , mM ) and d = (d1 , L , d N )T
T
Aki and Richards
P. 677-699

Construct the Hermitian matrix (S = S~ = (S ) ):


∗ T  0 G
S = ~ 
N

sij = s ∗ji G 0  M
r
( )
N M
assures the existence of an orthogonal set of eigenvectors wi i = 1, L , N + M
r r
with eigenvalues λi which satisfy Swi = λi wi (
i = 1, L , N + M . )
det (S − λI ) = (λ − λ1 ) ⋅ L ⋅ (λ − λ N +M ) = 0
Eigenvalues are the solution of
r r r r r r
Write wi as the sum of two vectors wi = ui + vi ; then  0 G   ui   ui 
N +M N M G~   r  = λi  r 
 0  vi   vi 
If λi is a nonzero eigenvalue, we get the following coupled equations for the eigenvector
r r
r
Gvi = λi ui
r (
pair ui , vi : )
~r r (12.92)
Gui = λi vi
This pair of equations is also satisfied by the pair (− uri , vri ) with eigenvalue − λi .
There are p pairs of nonzero eigenvalues ± λi ; the corresponding eigenvector pairs are:
(uri , vri ) for λi i = 1,L , p
(− uri , vri ) for − λi i = 1,L , p
Singular Value Decomposition (SVD)
r r
For zero eigenvalues, equation (12.92) is decoupled, and ui and vi become independent:
r r
Gvi = 0 i = p + 1, L , M
~r r (12.93)
Gu i = 0 i = p + 1, L , N
r r
Thus, among N + M eigenvalues of Sw = λw , 2 p are nonzero and the rest N + M − 2 p are
zero.

r
The data space, spanned by ui (i = 1, L , N ) and the model space spanned by vi (i = 1, L , M )
r
(
are coupled only through nonzero eigenvalues ± λi i = 1, L , p . )
From (12.92), we find that
~ r r
GGvi = λi2 vi
~r 2r
GGui = λi ui

~ ~ r r
Since GG and GG are both Hermitian, each of vi and ui forms an orthogonal set of
eigenvectors with real eigenvalues. After normalization, we can write:
r
v~i v j = δ ij i, j = 1, L , M
~ r
ui u j = δ ij i, j = 1, L , N Aki and Richards
P. 677-699
Singular Value Decomposition (SVD)
r r
Define a matrix V with column vectors vi and a matrix U with column vectors ui :

 v11 L ↓ L v1M   u11 L ↓ L u1N 


 r   r 
V = M vi M  U = M ui M 
vM 1 L ↓ L vMM  u N 1 L ↓ L u NN 
   
~ ~
Then UU = UU = I N
~ ~
V V = VV = I M
Divide U into U p and U 0 , where U p is made up from the eigenvectors with nonzero
eigenvalues and U consists of the eigenvectors with zero eigenvalues. Likewise, V is
0
divided into V p and V0 :
 u11 L u1 p   v11 L v1 p 
   
Up =  M M  Vp =  M M 
u N 1 L u Np  vM 1 L vMp 
   
~ ~ ~ ~
U pU p = V pV p = I p U pU p ≠ I N ,V pV p ≠ I M
Aki and Richards
P. 677-699
Aki and Richards
Singular Value Decomposition (SVD) P. 677-699

Λ whose elements are nonzero eigenvalues λ1 , L , λ p .

}
Introduce the diagonal matrix
Then (12.92) and (12.93) can be rewritten as: GV p = U p Λ p
~
GU p = V p Λ p
GV0 = 0
~
GU 0 = 0
Λ p
GV = G (V p , V0 ) = (U p , U 0 )
0
0 0

Since VV~ = I we have


M
~
Λ p 0  V p 
G = (U p ,U 0 )   ~
= U Λ V
0 V0 
~ p p p
0

This is an important factorization of G . It shows that G can be constructed by U p and V p


alone. U 0 and V0 spaces are blind spots not illuminated by the operator G .
Aki and Richards
Singular Value Decomposition (SVD) P. 677-699
r r
Gm = d
~
Λ  Vp 

~
(
G = UΛV = U p , U 0 ) p 0
 

 ~
~  = U p Λ pV p
 0 0 V0 
r r r
Gvi = λi ui i = 1, L , p vi ∈ C M ; i = 1, L , M
r r i = p + 1, L , M
r
Gvi = 0 ui ∈ C N ; i = 1, L , N
~r r
Gui = λi vi i = 1, L , p r r r r
~r r u1 L u p u p +1 L N u
Gu i = 0 i = p + 1, L , N
r r r r
v1L v p v p +1LvM
Up U0
Vp V0

{ {
part of the model that can be added to the data in this space can not be explained by
model without contradicting the data; the model, i.e. is the source of discrepancy
causes non-uniqueness between data and the prediction by G
SVD and the generalized inverse Gg−1 ; case I: N 〉 M = p Aki and Richards
P. 677-699
~ ~
The exact inverse of G = UΛV , when it exists, can be written as G −1 = VΛ−1U .
Therefore, it is natural~to consider the following expression as an inverse operator to the
operator G = U p Λ pV p : −1 −1 ~
Gg = V p Λ p U p
~
(
I. Consider the case in which there −is1 no V0 but U 0 -space exists. Then GG = V p Λ2pV p
~
)
~
( ) ~
will have the exact inverse GG = V p Λ−p2V pand the least-squares method is applicable.
The normal equations are written as:
~ r ~
r G Gm = Gd .
And the solution mg is given by
( )
~ −1 ~ r ~ r
r −2 ~ −1 ~
r −1
r
mg = GG Gd = V p Λ p V p ⋅V p Λ pU p d = V p Λ pU p d = Gg d .
Thus, in this case, the generalizedrinverse 2is nothing but the least-squares solution, in which
r
the sum of squares of residuals d − Gm , is minimized.
r −1
r
Putting mg = Gg d we have
r r r ~ −1 ~
r r ~ r U0
d − Gmg = d − U p Λ pV pV p Λ p U p d = d − U pU p d .
~
Since U pU p = I , we find that

r
~ r
( r
) ~ r ~
U p d − Gmg = U p d − U pU pU p d = 0.
~ r r d
r
r r
d − Gmg
r
d −r Gmg has no components in U p -space; Up
Gmg has no components in U 0 -space. r
Gmg
SVD and the generalized inverse Gg−1 ; case II: N 〉 M 〉 p
~ ~
The exact inverse of G = UΛV , when it exists, can be written as G −1 = VΛ−1U .
Therefore, it is natural~to consider the following expression as an inverse operator to the
operator G = U p Λ pV p :
~
Gg−1 = V p Λ−p1U p

II.Consider the case in which both V0 - and U 0-spaces exists. Then the generalized inverse
−1 −1 ~
Gg = V p Λ p U p will simultaneously minimize: r r2
d − Gm in the data space
and r2
m in the model space.

r −1
r
The generalized inverse solution is: mg = Gg d

Aki and Richards


P. 677-699
The generalized inverse solution and resolution and error
Resolution in model space
c c
uniqueness reliability
r r
We have thergeneralized
r inverse solution mg and the true (earth) model m with
r −1 r
mg = Gg d and d = Gm ; therefore we can write: r r
−1
mg = Gg Gm
r r r
When the data vector d has a component in U 0 space, the equation d = Gm does not hold.
r r ~
The above relation between mg and m is valid even in that case, because U pU 0 = 0 and
−1 −1 ~ ~
the operator G g = V p Λ p U p annihilates U 0 -space anyway. With G = U p Λ pV p we can
write:
r −1 ~ ~ r
mg = V p Λ p U pU p Λ pV p m
~ r
= V pV p m
~ r r
If there is no V0 -space, V pV p = I and m g = m . Thus when V0 = 0 , the solution is unique
whether U 0 -space exists or not.
~
The matrix V pV p is the~ resolution matrix.
The row vector of V pV p is the closest to the delta function (unit diagonal element and
zero otherwise) in a least-squares
~ sense.
The diagonal elements of V pV p are useful measures of resolution. The total sum of
~
diagonal elements (tr. V pV p ) is equal to p . (there are p unit eigenvalues)
Aki and Richards
P. 677-699
The generalized inverse solution and resolution and error
Resolution in data space
c c
r uniqueness rreliability
We have thergeneralized
r inverse solution mg and the true (earth) model m with
r −1 r
mg = Gg d and d = Gm ; therefore we canr write:−1 r
mg = Gg Gm
r r
The observed data vector d can be related to the predicted d g by the generalized inverse
as follows: r −1
r
d g = GGg dr
~
= U pU p d
~
Thus, when there is no U 0 , U pU p = I and a perfect fit is obtained between the observed
and predicted. If U 0 exists, a discrepancy between them occurs, and the predicted is
expressed as a weighted~average of the observed. The weighting coefficients are given by
the row vectors of U pU p .
~ r r r
Since U 0 d g = 0 , N − p constraints exist among r N components of d g . Therefore,
~
only p components of the predicted
~ data vector dg are independent. Since tr.U p p = p,
U
the diagonal elements of U pU p can be used to divide the data into p portions, to each of
which one independent prediction can be assigned. To make more predictions is meaningless,
because they give only redundant information.
Aki and Richards
P. 677-699
Aki and Richards
The generalized inverse solution and resolution and error P. 677-699

c c
uniqueness reliability
r
r
The reliability of the solution is measured by its covariance matrix. The error ∆mg in the
solution due to error ∆d in the data can be written as: r
r
∆mg = Gg−1∆d .
Therefore, their covariance matrices are related by
r ~ −1
r ~ ~ −1
∆mg ∆mg = Gg ∆d∆d Gg .
Assuming that all components of the data vector are statistically independent and share the
same variance σ d2 , we have r ~ 2 −1 ~ −1
∆mg ∆mg = σ d Gg Gg .
N 〉 M = p , i.e. U 0 ≠ 0 and V0 = 0 we can write with Gg−1 = (GG ) G
~ −1 ~
For
r ~ 2 ~
∆mg ∆mg = σ d GG
−1
( )
~
In general, putting Gg−1 = V p Λ−p1U p gives:
r ~ ~ ~
∆mg ∆mg = σ d2V p Λ−p1U pU p Λ−p1V p
−2 ~
= σ dV p Λ p V p .
2

r ~
Eigenvectors with small eigenvalues can be eliminated from the solution in order to keep ∆mg ∆m ( )
2 ~
g = σ d GG
−1
small;
this, however degrades the resolution in both model and data spaces.
Aki and Richards
The maximum-likelihood inverse P. 677-699

The probability density function for the multivariate Gaussian distribution with covariance
matrix Rdd is written as:
12
Rdd−1
() r
f d =
(2π ) N 2
exp  (
 1 r
− 

d − )
G m

(
r ∗ T −1 r
 R dd d − )
G m
r
.
 2 
( ) ( )
r r ∗ T −1 r r
In order to maximize the likelihood function we must minimize 
 d − G m  R d − G m
r r2   dd
instead of d − Gm .
In other words, one has to minimize the weighted sum of the squared residual, with the weight
matrix being the inverse of the data-covariance matrix.
r r2
The generalized inverse minimizes d − Gm . Only if the data covariance matrix Rdd
is equal to σ d2 I , then the maximum-likelihood estimate equals the generalized inverse
estimate.

( ) ( )
r r ∗ T r r in data space
The maximum-likelihood inverse minimizes:  d − Gm  Rdd−1 d − Gm

 
and (m ) Rmm m
r ∗ T −1 r
in model space.

The matrices Rdd and Rmm are positive definite.

{
a priori knowledge
Rmm ~ different physical dimensions
smoothness of fluctuations
Aki and Richards
The stochastic inverse P. 677-699
r r r
We consider that the data consists of signal and noise: d = Gm + n
r~
r r r r r mm = Rmm
and that both m and n are stochastic processes with m = n = 0 and .
r
n n~ = R nn
The stochastic inverse roperator L r is determined by minimizing the statistical average of the
discrepancy between m and Ld . r r
Consider repeated experiments r (k ) m and n are generated. Suppose theirrsample values
r (k ) in which
at the k-th experiment are m and n . For each experiment, we compute Ld , and seek L
2
which minimizes: 1 n 
(k ) 
N

∑  (k )
m − ∑ Lij d j  .

n  i
k =1  j =1 
Differentiation with respect to Lij and equating to zero leads to the normal equations:
r~ r~
md = L dd or L = Rmd Rdd−1.
r~
If m and n are uncorrelated ( mn = 0) , we obtain: Rdd = dd = GRmmG + Rnn
r r r~ ~
~
and Rmd = RmmG.
Eventually this leads to:
~
( ~
L = RmmG GRmmG + Rnn )−1
Aki and Richards
The stochastic inverse P. 677-699
A special case of the stochastic inverse, in which Rmm = σ m2 I and Rnn = σ n2 I
gives a good approximation to the generalized inverse:
~ ~
(
L0 = G GG + ε 2 I )−1
where ε 2 = σ n2 σ m2 .

]( )
 Λ2 + ε 2 I −1 ~
~
(
In terms of eigenvectors we write: GG + ε 2 I ) [
−1
= U p ,U 0 

p

0  p 
 U
− 2  ~ 
 0 ε I  U 0 
( −1
) ~
= U p Λ2p + ε 2 I U p + U 0ε −2U 0 .
Λp
~ ~ ~
Since G = V p Λ pU p and U pU 0 = 0 , we find
−1 −1 ~
~ ~ 2
(
L0 = G GG + ε I
Λ 2
+ ε
)
2
I
−1
= Vp
~
U p.
(This is an approximation to G = V p Λ U p .)
g p
p
The contributions of eigenvectors with eigenvalues smaller than ε 2 are suppressed in the
stochastic inverse.

(~
Because GG + ε 2 I )
−1
(
= V p Λ2p + ε 2 I )
−1 ~ ~
V p + V0ε − 2V0 , and V0V p = 0 we can write:
~
(
L0 = GG + ε 2 I )
−1 ~
G.
This inverse is known as the Marquardt-Levenberg damped least-squares.
r and
This inverse is obtained by minimizing the sum of the squares of data residual r 2 model
−2 r 2
parameter with weights inversely proportional to their variances; i.e σ n d − Gm + σ m m
−2

where again ε 2 = σ n2 σ m2 .
Aki and Richards
The stochastic inverse P. 677-699

Λ2p ~
The resolution matrix for L0 is given by L0G = V p Vp .
Λ2p + ε 2 I

The trace of L0G , which is a measure of resolution in model space can be written as
p
λ2i
trace of L0G = ∑λ
i =1
2
+ε 2
〈 p.
i

Thus the introduction of ε 2 will degrade resolution, but will stabilize the solution by reducing
the covariance.

r ~ ~ Λ2p ~
The covariance matrix is given by ∆m∆m = σ d2 L0 L0 = σ d2V p V p,
(Λ 2
p + ε 2I )
2

r r
where σ the variance of the error ∆d in data d , assuming a uniform
2
d is r and independent
errorrfor each individual measurement. In our stochastic model, ∆d corresponds to
r r
n = d − Gm and σ d2 = σ n2 .

Thus the increase in ε reduces the error of model-parameter estimates, thereby sacrificing
2

the resolution. In the stochastic inverse scheme, the best choice of ε is σ n2 σ m2 (the ratio
2

of noise variance to model variance).

You might also like