Professional Documents
Culture Documents
Lectr 16
Lectr 16
σ ε2 = E{ | Y − ϕ ( X ) |2 } = E X [ EY { | Y − ϕ ( X ) |2 X }]
z z
where the inner expectation is with respect to Y, and the outer one is
with respect to X .
Thus
σ ε2 = E[ E{ | Y − ϕ ( X ) |2 X }]
+∞
= −∞
E{ | Y − ϕ ( X ) | 2
X } f X ( X )dx. (16-6)
To obtain the best estimator ϕ , we need to minimize σ ε2 in (16-6)
with respect to ϕ . In (16-6), since f X ( X ) ≥ 0, E{ | Y − ϕ ( X ) |2 X } ≥ 0,
and the variable ϕ appears only in the integrand term, minimization
of the mean square error σ ε2 in (16-6) with respect to ϕ is
equivalent to minimization of E{ | Y − ϕ ( X ) |2 X } with respect to ϕ .
3
PILLAI
Since X is fixed at some value, ϕ ( X ) is no longer random,
and hence minimization of E{ | Y − ϕ ( X ) |2 X } is equivalent to
∂
E{ | Y − ϕ ( X ) |2 X } = 0. (16-7)
∂ϕ
This gives
E{| Y − ϕ ( X ) | X } = 0
or
E{Y | X } − E{ϕ ( X ) | X } = 0. (16-8)
But
E{ϕ ( X ) | X } = ϕ ( X ), (16-9)
4
PILLAI
in (16-8) we get the desired estimator to be
Yˆ = ϕ ( X ) = E{Y | X } = E{Y | X 1 , X 2 , , X n }. (16-10)
Thus the conditional mean of Y given X 1 , X 2 , , X n represents the best
estimator for Y that minimizes the mean square error.
The minimum value of the mean square error is given by
σ min
2
= E{ | Y − E (Y | X ) |2 } = E[ E{ | Y − E (Y | X ) |2 X }]
var(Y X )
= E{var(Y | X )} ≥ 0. (16-11)
Thus
f X , Y ( x, y ) kxy 2y
f Y X ( y | x) = = = ; 0 < x < y < 1.
f X ( x) kx (1 − x ) / 2 1 − x
2 2
(16-13)
Hence the best MMSE estimator is given by 6
PILLAI
1
Yˆ = ϕ ( X ) = E{Y | X } = x
y f Y | X ( y | x)dy
1 1
= x
y 2y
1− x 2
dy = 2
1− x 2 x
y 2
dy
1
2 y 3
2 1 − x 3 2 (1 + x + x 2 )
= = = . (16-14)
3 1− x x 3 1− x
2 2
3 1− x 2
σ n2 = min E{| ε |2 }
a1 , a2 , , an
n
= min E{εε } = min E{ε (Y −
*
ai X i )*}
a1 , a2 , , an a1 , a2 , , an
i =1
n
= min E{ε Y } − min *
ai E{ε X l*}. (16-24)
a1 , a2 , , an a1 , a2 , , an
i =1
But using (16-21), the second term in (16-24) is zero, since the error is
orthogonal to the data X i , where a1 , a2 , , an are chosen to be
optimum. Thus the minimum value of the mean square error is given
by 12
PILLAI
n
σ = E{ε Y } = E{(Y −
2
n
*
ai X i )Y *}
i =1
n
= E{| Y | } −
2
ai E{ X iY *} (16-25)
i =1
i =1 PILLAI
Using (16-29)-(16-30), we get
E{ε X k*} = E{ε }E{ X k*} = 0, k = 1 → n. (16-31)
From (16-31), we obtain that ε and X k are zero mean uncorrelated
random variables for k = 1 → n. But ε itself represents a Gaussian
random variable, since from (16-28) it represents a linear combination
of a set of jointly Gaussian random variables. Thus ε and X are
jointly Gaussian and uncorrelated random variables. As a result, ε and
X are independent random variables. Thus from their independence
E{ε | X } = E{ε }. (16-32)
But from (16-30), E{ε } = 0, and hence from (16-32)
E{ε | X } = 0. (16-33)
Substituting (16-28) into (16-33), we get
n
E{ε | X } = E{Y − ai X i | X } = 0 15
i =1
PILLAI
or
n n
E{Y | X } = E{ ai X i | X } = ai X i = Yl . (16-34)
i =1 i =1
From (16-26), E{Y | X } = ϕ ( x) represents the best possible estimator,
n
and from (16-28), a X represents the best linear estimator.
i =1 i i
Thus the best linear estimator is also the best possible overall estimator
in the Gaussian case.
Next we turn our attention to prediction problems using linear
estimators.
Linear Prediction
Suppose X 1 , X 2 , , X n are known and X n +1 is unknown.
Thus Y = X n +1 , and this represents a one-step prediction problem.
If the unknown is X n + k , then it represents a k-step ahead prediction
problem. Returning back to the one-step predictor, let Xˆ n +1
represent the best linear predictor. Then
16
PILLAI
n
Xˆ n +1 = −
∆
ai X i , (16-35)
i =1
= a1 X 1 + a2 X 2 + + an X n + X n +1
n +1
= ai X i , an +1 = 1, (16-36)
i =1
rn*−1 rn*− 2 r0 r1 an 0
r *
n r*
n −1
*
r r0
1
1 σ n2
Let 19
PILLAI
r0 r1 r2 rn
r1* r0 r1 rn −1
Tn = .
(16-44)
rn* rn*−1 r1* r0
Notice that Tn is Hermitian Toeplitz and positive definite. Using
(16-44), the unknowns in (16-43) can be represented as
a1 0
a2 0 Last
a3 0 column
=T n
−1
=σ 2
n (16-45)
of
an 0 Tn−1
1 σ n2
Let
20
PILLAI
Tn11 Tn12 Tn1,n +1
Tn21 Tn22 Tn2,n +1
Tn−1 = . (16-46)
an Tnn +1,n +1
Eq. (16-49) represents the best linear predictor coefficients, and they
can be evaluated from the last column of Tn in (16-45). Using these,
The best one-step ahead predictor in (16-35) taken the form
1 n
Xˆ n +1 = − n +1, n +1
(Tni ,n +1 ) X i . (16-50)
T
n i =1
and from (16-48), the minimum mean square error is given by the
(n +1, n +1) entry of Tn−1.
From (16-36), since the one-step linear prediction error
ε n = X n +1 + an X n + an −1 X n −1 + + a1 X 1 , (16-51) 22
PILLAI
we can represent (16-51) formally as follows
X n +1 → 1 + an z −1 + an −1 z − 2 + + a1 z − n → ε n
Thus, let
An ( z ) = 1 + an z −1 + an −1 z − 2 + + a1 z − n , (16-52)
them from the above figure, we also have the representation
1
εn → → X n +1.
An ( z )
The filter
1 1
H ( z) = = (16-53)
An ( z ) 1 + an z −1 + an −1 z − 2 + + a1 z − n
represents an AR(n) filter, and this shows that linear prediction leads
to an auto regressive (AR) model. 23
PILLAI
The polynomial An (z ) in (16-52)-(16-53) can be simplified using
(16-43)-(16-44). To see this, we rewrite An (z ) as
An ( z ) = a1 z −n + a2 z −( n −1) + + an −1 z − 2 + an z −1 + 1
a1 0
a2 0
= [ z − n , z −( n −1) , , z −1 ,1] = [ z − n , z −( n −1) , , z −1 ,1] Tn−1
an 0
1 σ n2
(16-54)
A B (16-56)
= A D − CA−1 B .
C D
In particular if D ≡ 0, we get
−1 ( −1) n
A B
CA B = . (16-57)
A C 0
0
C = [ z − n , z −( n −1) , , z −1 ,1], A = Tn , B =
σ n2
25
PILLAI
we get
0
r0 r1 r2 rn
0
r1* r0 r1 rn −1
(−1) n Tn σ 2
An ( z ) = = n
. (16-58)
| Tn | 0 | Tn |
rn*−1 rn*− 2 r0 r1
σ n2
z −n z −( n −1) z −1 1
z −n , , z −1 ,1 0
or
σ n2+1 = σ n2 (1− | sn +1 |2 ) < σ n2 , (16-64)
since (1− | sn +1 |2 ) < 1. Thus the mean square error decreases as more
and more samples are used from the past in the linear predictor.
In general from (16-64), the mean square errors for the one-step
predictor form a monotonic nonincreasing sequence 29
PILLAI
σ n2 ≥ σ n2+1 ≥ σ k2 > → σ ∞2 (16-65)
whose limiting value σ ∞2 ≥ 0.
Clearly, σ ∞2 ≥ 0 corresponds to the irreducible error in linear
prediction using the entire past samples, and it is related to the power
spectrum of the underlying process X (nT ) through the relation
1 +π
σ = exp
2
∞ ln S XX (ω )d ω ≥ 0. (16-66)
2π −π
ω
−π π
Power Spectrum of a regular stochastic Process
ω
−π ω1 ω 2 π
ω
ω1 ω k
33
PILLAI