Professional Documents
Culture Documents
Lessons in Digital Estimation Theory
Lessons in Digital Estimation Theory
in Digital
PRENTICE-HALL SIGNAL PROCESSING SERIES Estimation Theory
Alan V. Oppenheim, Editor
MENDEL,JERRY
M., (date)
Lessonsin digital estimation theory.
Bibliography: p.
Includes index.
1. Estimation theory. I. Title.
QA276.8.M46 1986
ISBN o-13-530809-7
511'.4 86-9365 Contents
...
Editorial/production supervision: Gretchen K. Chenenko .
XIII
Cover design: Lundgren Graphics
Manufacturing buyer: Gordon Osbourne LESSON 1 INTRODUCTION, COVERAGE, AND PHILOSOPHY 1
LESSON 14 ESTIMATION OF RANDOM PARAMETERS: LESSON 18 STATE ESTIMATION: FILTERING EXAMPLES 160
THE LINEAR AND GAUSSIAN MODEL 118
Introduction 160
Introduction 118 Examples 160
Mean-SquaredEstimator 118 Problems 169
Best Linear UnbiasedEstimation,
Revisited 121
LESSON 19 STATE ESTIMATION: STEADY-STATE KALMAN
Maximum a Posteriori Estimator 123
FILTER AND ITS RELATIONSHIP TO A DIGITAL
Problems 126 WIENER FILTER 170
Introduction 170
LESSON 15 ELEMENTS OF DISCRETE-TIME GAUSS-MARKOV
Steady-StateKalman Filter 170
RANDOM PROCESSES 128
Single-Channel Steady-StateKalman Filter 173
Introduction 128 Relationships Between the Steady-State
Definitions and Propertiesof Discrete-Time Kalman Filter and a Finite Impulse
Gauss-Markov Random Processes 128 ResponseDigital Wiener Filter 176
A Basic State-VariableModel 131 Comparisons of Kalman and Wiener Filters 181
Properties of the Basic State-Variable Problems 182
Model 133
Signal-to-NoiseRatio 137 LESSON 20 STATE ESTIMATION: SMOOTHING 183
Problems 138
Three Types of Smoothers 183
Approaches for Deriving Smoothers 184
LESSON 16 STATE ESTIMATION: PREDICTION 140 A Summary of Important Formulas 184
Introduction 140 Single-StageSmoother 184
Single-StagePredictor 140 Double-Stage Smoother 187
A General StatePredictor 142 Single- and Double-StageSmoothers
as General Smoothers 189
The Innovations Process 146
Problems 192
Problems 147
xi
X
Contents Contents
Introduction 282
Concept of Sufficient Statistics 282
Exponential Families of Distributions 284
Exponential Families and Maximum-
Likelihood Estimation 287
Sufficient Statisticsand Uniformly Minimum-
Variance Unbiased Estimation 290
Problems 294
REFERENCES 300
INDEX
abies. To some readers, this lesson may be a review of material already known Lesson 24 provides a transition from our study of esr.Ination for linear
to them. models to estimation for nonlinear models. Becausemany rca(-worId systems
General results for both mean-squaredand maximum a posteriori esti- are continuous-time in nature and nonlinear, this lesson explains how to
mation of random parameters are covered in Lesson 13. These results are linearize and discretize a nonlinear differential equation model.
specialized to the important caseof the linear and Gaussianmodel in Lesson Lesson 25 is devoted primarily to the extended Kalman filter (EKF),
14. Best linear unbiased and weighted least-squaresestimation are also re- which is a form of the Kalman filter that has been extended to nonlinear
visited in Lesson 14. Lesson 14 is quite important, becauseit gives conditions dynamical systems of the type described in Lesson 24. The EKF is related to
under which mean-squared,maximum a posteriori, best-linear unbiased, and the method of iterated least squares(ILS), the major difference between the
weighted least-squaresestimatesof random parameters are identical. Lesson two being that the EKF is for dynamical systems whereas ILS is not. This
A, which is a supplemental one, is on the subject of sufficient statistics and lesson also shows how to apply the EKF to parameter estimation, in which
statistical estimation of parameters.It fits in very nicely after Lesson 14. casestates and parameters can be estimated simultaneously, and in real time.
Lesson 15 provides a transition from our study of parameter estimation The problem of obtaining maximum-likelihood estimatesof a collection
to our study of state estimation. It provides much useful information about of parameters that appears in the basic state-variable model is treated in
elements of discrete-time Gauss-Markov random processes,and also estab- Lesson 26. The solution involves state and parameter estimation, but calcu-
lishes the basic m&e-variablemodel, and its statistical properties, for which we lations can only be performed off-line, after data from an experiment hasbeen
derive a wide variety of state estimators.To some readers,this lessonmay be a collected.
review of material already known to them. The Kalman-Bucy fiiter, which is the continuous-time counterpart to the
Lessons 16 through 22 cover state estimation for the Lesson 15 basic Kalman filter, is derived from two different viewpoints in Lesson 27. We
state-variable model. Prediction is treated in Lesson 16. The important inno- include this lesson becausethe Kalman-Bucy filter is widely used in linear
vations process is also coveredin that lesson. Filtering is the subject of Lessons stochastic optimal control theory.
17, 18, and 19. The mean-squared state filter, commonly known as the
Kalman filter, is developedin Lesson 17. Five exampleswhich illustrate some
interesting numerical and theoretical aspects of Kalman filtering are pre- PHlLOSOPHY
sented in Lesson 18. Lesson 19 establishesa bridge between mean-squared
estimation and mean-squared digital signal processing. It shows how the The digital viewpoint is emphasizedthroughout this book. Our estimation
steady-state Kalman filter is related to a digital Wiener filter. The latter is algorithms are digital in nature; many are recursive. The reasonsfor the digital
widely used in digital signal processing. Smoothing is the subject of Lessons viewpoint are:
20, 21 and 22. Fixed-interval, fixed-point, and fixed-lag smoothersare devel-
oped in Lessons 20 and 21. Lesson 22 presents some applications which 1. much real data is collected in a digitized manner, so it is in a form ready
illustrate interesting numerical and theoretical aspects of fixed-interval to be processedby digital estimation algorithms, and
smoothing. These applications are taken from the field of digital signal pro- 2. the mathematics associatedwith digital estimation theory are simpler
cessing and include minimum-variance deconvolution, maximum-likelihood than those associatedwith continuous estimation theory.
deconvolution, and recursivewaveshaping.
Lesson 23 shows how to modify results given in Lessons16, 17, 19, 20 Regarding (2), we mention that very little knowledge about random processes
and 21 from the basic state-variable model to a state-variable model that is needed to derive digital estimation algorithms, because digital (i.e.,
includes the following effects: discrete-time) random processescan be treated as vectors of random vari-
ables. Much more knowledge about random processes is needed to design
continuous-time estimation algorithms,
1. nonzero mean noise processesand/or known bias function in the mea- Suppose our underlying model is continuous-time in nature. We are
surement equation, faced with two choices: developa continuous-time estimation theory and then
2. correlated noise processes, implement the resulting estimators on a digital computer (i.e., discretize the
3. colored noise processes?and continuous-time estimation algorithm), or discretize the model and develop a
4, perfect measurements. discrete-time (i.e., digital) estimation theory that leads to estimation algo-
introduction, Coverage, and Philosophy Lesson 1
7
The Linear Model Lessun 2 Examples
where k 2 j + 1. We now focus our attention on the value of x(k) at k = kl, where
me solution t0 (2-9) is
1 5 kl 5 N. Using (2-17), we can express x(N), x(/V - l), . . . , x(kl + 1) as an explicit
x(k) = Q&x(O) (2-11) function of x(k,), i.e.,
SO that
x(k) = @k-klx(kl) + i Qk-yu(i - 1) (2-H)
z(k) = h@x(O) + v(k) (2-12) i=kl+l
Collecting the N measurements,as before, we obtain where k = kl + 1, kl + 2,. . . , N. In order to do the samefor x(l), x(2), . . . , x(kl - l),
we solve (2-17) for x(j) and set k = kl,
to implement an optimal control law for it, or, to implement a digital signal processor. 2 = kl - 1, kl - 2,. . . , 1
Usually, we cannot measure the entire state vector, and our measurements are car-
These N equations can now be collected together, to give
rupted by noise. In state estimation, our objective is to estimate the entire state vector
from a limited collection of noisy measurements.
Here we consider the problem of estimating n x 1 state vector x(k), at k = 1,
2 ,..*v N from a scalar measurementz (k), where k = 1,2, . . . , N. The model for this
example is
x(k + 1) = #x(k) + yu(k) (2-14)
z(k) = hx(k) + v(k) (2-15)
we are keeping this example simple by assuming that the systemis time-invariant and VW)
v(N .- 1)
has only one input and one output; however, the results obtained in this example are + M(N,kl)
1
(2-21)
easily generalized to time-varying and multichannel systemsor multichannel systems
v(1)
If we try to collect our N measurements as before, we obtain
z(N) = hx(N) + v(N)
z(N - 1) = hx(N - 1) + v(N - 1) where the exact structure of matrix M(N, kl) is not important to us at this point.
... (2-16) Observe that the state at k = kl plays the role of parameter vector 8 and that both X
z(l) = hx(1) + v(l) ! and V are different for different values of kl.
If x(0) and the system input u(k) are deterministic, then x(k) is deterministic for
12 The Linear Model Lesson 2 Examples 13
(2-24)
W9 X(N - 1)
wherek = 1,2,.. . , N. It is easyto seehow to collect these N equations, tu give
in which $(8*, k}/H* is short for $f($, k)/&?evaiuated at 8 = @*. We shall often refer to 9 as p.
Observe that X dependson U*. We will discussdifferent waysfor specifying 0* in Using (2-27), we can also express 8 = ~1as
Lesson 25. 0 (2-29)
CL= Qqr
Example 2-6 DeconvoIu?ion (Mended, 19S3b)
In Example 2-I we showedhow a convolutional model could be expressedas the linear r = col (r(l), r(2), . . . , r(N)) (2-30)
model S! = X0 + V. In that example we assumed that both input and output mea-
surements were available, and, we wanted to estimate the sampled values of the and
systems impulse response,Here we begin with the same convolutional model, written (2-31)
as
Q, = diag (4 WY q (21, - a- ) 4 WI)
In this case (2-28) can be expressedas
Z(N) = X(N - l)Q,r + WY) (2-32)
14 The Linear Model Lesson 2 Lesson 2 Problems
When event locations q (l), q (2), . . . , q (IV) are known, then we can view (2-32) as a smoothed or interpolated estimate. Prediction and filtering can be done in real
linear model for determining r. time whereassmoothing can never be done in real time. We will seethat the
Regardlessof which linear deconvolution model we use as our starting point for impulse responsesof predictors and filters are causal, whereas the impulse
determining F, we see that deconvolution corresponds to case B.l. Put another way, responseof a smoother is noncausal.
we have shown that the design of a deconvolution signal processing filter is isomorphic We use 6(k) to denote estimation error, i.e.,
to the problem of estimating random parameters in a linear model. Note, however, that
the dimension of 8, which is IV x 1, increasesasthe number of measurementsincrease. 6(k) = 0 - 8(k) (2-33)
In all other examples 8 was n x 1 where n is a fixed integer. We return to this point in
Lesson 14 where we discussconvergenceof estimatesof + to their true values. In state estimation, %(kl 1N) denotes state estimation error, and, ji(kl 1IV)=
In Lesson 14, we shall develop minimum-variance and maximum-likelihood x(b) - i(kl 1N>. In deconvolution @(i 1N) is defined in a similar manner.
&convolution filters. Equation (2-28) is the starting point for derivation of the former Very often we use the following estimation model for Z(k),
filter, whereas Equation (2-32) is the starting point for derivation of the latter
8(k) = X(k)i(k) (2-34)
filter. Cl
To obtain (2-34) from (2-l), we assumethat V(k) is zero-mean random noise
that cannot be measured. In some applications (e.g,, Example 2-2) g(k)
NOTATIONAL PRELIMINARIES representsa predicted value of Z(k). Associated with S(k) is the error S(k),
where
Equation (2-l) can be interpreted as a data generating model; it is a mathe- 2(k) = Z(k) - i(k) (2-35)
matical representation that is associatedwith the data. Parameter vector 0 is
assumedto be unknown and is to be estimated using Z(k), X(k) and possibly satisfiesthe equation
other a priori information. We use F(k) to denote the estimate of constant f%(k) = X(k#(k) + V(k) (2-36)
parameter vector 8. Argument k in 8(k) denotesthe fact that the estimate is
basedon measurementsup to and including the kth. In our preceding exam- In those applications where &(k) is a predicted value of Z(k), %(k) is known
ples, we would use the following notation for 6(k): as a prediction error. Other names for S(k) are equation error and mea-
surement residual.
Example 2-l [see (2-91: i(N) with components&i 1ZV) In the rest of this book we develop specific structures for 6(k). These
Example 2-2 [see (2-8)]: i(N) structures are referred to as estimators. Estimates are obtained whenever data
is processedby an estimator. Estimator structures are associatedwith specific
Example 2-3 [see (2-13)]: x(0 1N) estimation techniques, and these techniquescan be classified according to the
Example 2-4 [see (2-21)]: i(kl ] IV) natures of 8 and X(k), and what a priori information is assumedknown about
Example 2-5 [see (2-291: s(N) noise vector V(k). See Lesson 1 for an overview of all the different estimation
Example 2-6 [see (2-28)]: i(N) with componentsk(i ] N) techniquesthat are covered in this book.
INTRODUCTION
17
Least-Squares Estimation: Batch Processing Lesson 3 Derivation of Estimator 19
This is a systemof n hnear equations in the n components of &&). Example 3-2 (Mendel, 1973)
In practice, one does not compute &&J$ using (3-N)), because Figure 3-l depicts simphfied third-order pitch-plane dynamics for a typical, high-
computing the inverse of X(k)W(k)X(k) is fraught with numerical diffi- performance, aerodynamically controlled aerospacevehicle. Cross-coupling and body-
culties. Instead, the normal equationsare solved using stable algorithms bending effects are neglected. Normal acceleration control is considered with feedback
from numerical linear algebra that involve orthogonal transformations on normal acceleration and angle-of-attack rate. Stefani (1967) showsthat if the system
(see, e.g., Stewart, 1973; Bierman, 1977; and Dongarra, et al., 1979) gains are chosen as
Becauseit is not the purpose of this book to go into detaiIs of numerica c-2
linear algebra,we leaveit to the reader to pursuethis important subject. (3-18)
KNi= loo
Based on this discussion, we must view (3-10) as a useful Me-
oretical formula and not as a useful computationalformula. Remember
that this is a book on estimation theory, SOfor our purposes, theoretical K&= 100Ma (3-19)
formulas are just fine. and
5. Equation [3-13) can also be reexpressedas
C~f1ooM,
f!ff(k)W(k)2C(k) = U (3-14) K No= (3-20)
IOOMJ,
which can be viewed as an urthogona2it-ycondition between $5(k) and
W&)X(k). CMhogonality conditions pIay an important role in esti-
rnatiun theory. We shall see many more examples of such conditions (3-21)
throughout this book.
6. Estimates obtained from (3-N)) will be random! This is becauseT(k) is Stefani assumes2, 1845/~ is relatively small, and chooses C1 = 1400 and C, =
random, and, in some applications even X(k) is random. It is therefore 14,000. The closed-loop response resembles that of a second-order system with a
bandwidth of 2 Hz and a damping ratio of 0.6 that respondsto a step command of input
instructive to view (3-10) as a complicated transformation of vectors or acceleration with zero steady-state error.
matrices of random variables into the vector of random variables In general, M,, Mg, and Z, are dynamic parameters and all vary through a large
&&). In later lessons, when we examine the properties of &,&k), range of values. Also, M, may be positive (unstable vehicle) or negative (stable
these will be statistical properties, becauseof the random nature of vehicle). Systemresponsemust remain the samefor al1values of M,, Mg, and Z,; thus,
kvLs(~~. it is necessaryto estimate these parameters so that I&,, &, and KNacan be adapted to
keep C, and CZinvariant at their designed values. For present purposeswe shah assume
Example 3-l (Mendel, 1973, pp. 8647) that M,, Mb, and Z, are frozen at specific values.
Suppose we wish to calibrate an instrumentby making a series of uncorreIated mea- From Fig. 3-1,
surementson a constantquantity, Denoting the constantquantity as 0, our mea-
8(t) = M,a(t) + M&(t) (3-22)
surement equationbecomes
z(k) = 0 + v(k) (3-15)
where /c = 1,X . . . 9N. Collecting these N measurements,we have N4(f) = Zna((f) (3-23)
Our attention is directed at the estimation of M, and Ma in (3-22). We leave it as an
(3-16) exercise for the reader to explore the estimation of 2, in (3-23)
Our approach will be to estimate M, and Ms from the equation
e,(k) = M,a(k) -I- M&k) + vi(k) (3-24)
Scale Changes 23
where s,(k) denotes the measured value of d(k), that is corrupted by measurement
noise v,(k). We shall assume(somewhat unrealistically) that a(k) and 6(k) can both be
measured perfectly. The concatenated measurement equation for N measurements is
e,(k)
&,(k . - 1)
4%)
a(k .- 1)
W)
S(k .- 1)
negative Z axis; KNi, gain on Ni; 6, control-surface deflection; Ms, control-surface effec-
Figure 3-1 Pitch-plane dynamics and nomenclature: Ni, input normal acceleration along the
system-achieved normal acceleration along the negative Z axis; I&, control gain on N,
.
tiveness; IQ, control gain on &; Z,, normal acceleration force coefficient; k, axial velocity; N,,
.
e,(k -if + 1) a(k -it + 1) S(k - N + 1)
Cl
s K Na 4
J
I
N$ a(k - j)&,(k -j)
j-0
(3-26)
1
I
SCALE CHANGES
To begin, we consider the case when one additional measurement z(k + l),
made at tk + 1,*becomesavailable:
Least-Squares z (k + 1) = h(k + 1)9 + v(k + 1) (4-l)
When this eauation is combined with our earlier linear model we obtain a new
Estimation: linear mode;,
%(k + 1) = X(k + 1)6 + qr(k + 1) (4-2)
Recursive Processing where
(4-3)
%e(k + 1) = (4-4)
and
V(k + 1) = col(v(k + l)IV(k)) (4-5)
Using (3-10) from Lesson 3 and (4-2), it is clear that
INTRODUCTION dwLs(k + 1)
= [X(k + l)W(k + l)X(k + l)]-%(k + l)W(k + l)%(k + 1) (4-6)
In Lesson 3 we assumedthat Z(k) containedN elements,where N > dim 8 =
n. Supposewe decide to add more measurements,increasingthe total number TOproceed further we must assumethat W is diagonal, i.e.,
of them from N to N. Formula (3-10) in Lesson3 would not make use of the
W(k + 1) = diag@(k + l)IW(k)) (4-V
previously calculated value of 6 that is basedon N measurementsduring the
calculation of 6 that is basedon N measurements.This seemsquite wasteful. We shall now show that it is possible to determine &k + 1) from 6(k)
We intuitively feel that it should be possibleto compute the estimate basedon and r(k + 1).
N measurementsfrom the estimatebasedon N measurements,and a modifi-
cation of this earlier estimate to accountfor the N-N new measurements.In
this lessonwe shall justify our intuition. Theorem 4-l. (Information Form of RecursiveLSE). A recursive struc-
In Lesson3 we also assumedthat 6 is determined for a fixed value of 12. ture for dwLs(k) is
In many systemmodeling problems one is interestedin a preliminary model in + 1) = 6-(k) + Kw(k + l)[z(k + 1) - h(k + 1)6-(k)]
6-(k (4-8)
which dimension JZis a variable. This is becomingincreasinglymore important
as we begin to model large scale societal, energy, economic, etc. systemsin
which it may not be clear at the onset what effects are most important. One K&k + 1) = P(k + l)h(k + l)w(k + 1)
approach is to recompute 6 by means of Formula (3-10) in Lesson 3 for (4-9)
different values of ~1.This may be very costly, especiallyfor large scale sys-
tems, since the number of flops to compute 6 is on the order of n3. A second
approach is to obtain @for n = nl, and to use that estimate in a com- P-(k + 1) = P-(k) + h(k + l)w(k + l)h(k + 1) (4-10)
putationally effective manner to obtain 6 for n = ~22,where 112> nl. These
estimators are recursive in the dimension of 8. We shall also examine these These equations are initialized by 6,,(n) and P-(n), where P(k) is defined
estimators. below in (4-13) and are used for k = n, n + 1, . . . , N - 1.
28 Least-Squares Estimation: Recurske Prmess~ng Lessofl4 Recursive Least-Squares: Information Form 29
PJ-OOJSubstitute (4-3), (4-4) and (4-7) into (4-6) (someGmesdropping 7I. In (4-Q the term h(k + 1)6,-~(k) is a prediction of the actual mea-
the dependenceupon k and k + 1, for notational simplicity) to see that surement z(k + 1). Because 6,,(k) is based on Z(k), we express this
predicted vaIue as i(k + Ilk), i.e.,
= [W(k + l)W(k + l)X(k + 1)-J-[hwz + XW%] (4-l 1) i(k + l/k) = h(k -i- 1)&&k), (4-18)
Express &&k) as so that
(4-12) &r&k + 1) = &&k) + Kw(k + l)[r(k + 1) - i(k + Ilk)]. (4- 19)
where 3. Two recursions are present in our recursive LSE. The first is the vector
P(k) = [X(k)W(k)X(k)]-I (4-13) recursion for 4ww given by (4-8). Clearly &,&k + 1) cannot be com-
puted from this expression until measurementt(k + 1) is available. The
From (4-12) and (4-13) it is straightforward to show that second is the matrix recursion for P-l given by (4-10). Observe that
W(k)W(k)fqk) = P-l (k)&*&) (4-14) values for P-l (and subsequently KH)can be precomputed before mea-
surements are made.
and
4. A digital computer implementation of (4-8)-(4-10) proceeds as follows:
P-l (k + 1) = P-l (k) + h(k + l)~+ + l)h(k + 1) (4-S)
P- (k + 1) --, P(k + l)+Kw(k + l>-+ 8-(k + 1).
It now follows that
5. Equations (4-8)-(4-10) can also be used for k = 0, l,..., N - 1 using
&,&k + 1) = P(k + l)[hwz + P-I (k)&&k)] the following values for P- (0) and &&O):
= P(k + l){hwz + [P-(k + 1) - hwh]&&c)j
= 6&k) + P(k + l)hhj[z - h&&k)] (4-X) P- (0) = $1 + h(O)w(O)h(O) (4-20)
= 6&(k) + K& + l)[z - h&&)]
and
which is (4-8) when gain matrix I& is defined asin (4-9).
l3asedon preceding discussionsabout dim 6 = n and dim S!+(k)= IV, we
6,,(O) = P(0) [ $ + h(O)w(0)z (O)] (4-21)
know that the first value of IV for which (3-10) in Lesson 3 can be used is
AJ = n; thus, (4-8) must be initialized by &&z), which is computed using
(3-N)) in Lesson3. Equation (4-10) is also a recursiveequation for P-l (k + l), In these equations (which ale derived in Mendel, 1973,pp. 101-106;see,
which is initialized by Pm1(n) = X(n)W(n)X(n). q aiso, Problem 4-1) Qis a very large number, Eis a very small number, Eis
II ~l,andf=col(~,~,..., e>.When these initial values are used in
corYlments (4-8)-(4-10) for k = 0, 1, . e. , n - 1, then the resulting values obtained
for 6-(n) and P- (n) are the very sameones that are obtained from the
l+ Equation (4-8) can also be expressedas batch formulas for b,(n) and P- (n).
Often z(0) = 0, or there is no measurementmade at k = 0, so that
i&&k + 1) = [I - K&c + l)h(k + l)]&=(k)
we can set z(O) = 0. In this casewe can set w (0) = 0 so that P- (0) =
+ &(k + l)z(k + 1) (4-17) I, /a and 6(O) = a~. By choosing E on the order of l/a, we see that
which demonstrates that the rtxursive [east-squares estimate (LYE) ti a (4-8)-(4-10) can be initialized by setting e(O) = 0 and P(0) equal to a
time-varying digital filter that is excited by random inputs (i.e., the mea- diagonal matrix of very large numbers.
surements),one whose plant matrix may itself be random, becauseKw 6. The reason why the results in Theorem 4-1 are referred to as the infor-
and h(k + 1) may be random. The random natures of Kw and mation form of the recursive LSE is deferred until Lesson 11 where
(I - Kwh) make the analysis of this filter exceedingly difficult. If Kw connections are made between least-squaresand maximum-likelihood
and h are deterministic, then stability of this filter can be studied using estimators (see the section entitled The Linear Model (X(k) deter-
Lyapunov stability theory. ministic), Lesson 11).
Which Form To Use 31
Least-Squares Estimation: Recursive Processing Lesson 4
B = P(k + l), C = h(k + 1) and D = 1/w(k + 1). Then (4-10) looks like
MATRIX INVERSION LEMMA (4-22), so, using (4-23) we seethat
Equations (4-10) and (4-9) require the inversion of y1x TZmatrix P. If n is large P(k + 1) = P(k) - P(k)h(k + l)[h(k + l)P(k)h(k + 1)
than this will be a costly computation. Fortunately, an alternative is available, + w-l (k + l)]-h(k + l)P(k) (4-27)
one that is basedon the following matrix inversion lemma.
Consequently,
Lemma 4-1. If the matrices A, B, C, and D satisfy the equation Kw(k + 1) = P(k + l)h(k + 1) w(k + 1)
B- = A- + CD-C (4-22) = [P - Ph(hPh + wl)+ hP]hw
where all matrix inverses are assumed to exist, then = Ph[I - (hPh + w-)-hPh]w
= Ph(hPh + w-l)-l (hPh + w-l - hPh)w
B = A - AC(CAC + D)-CA. (4-23)
= Ph(hPh + iv-*)-
Proof. Multiply B by B using (4-23) and (4-22) to showthat BB- = I.
For a constructive proof of this lemma seeMendel(1973), pp. 96-97. El which is (4-25). In order to obtain (4-26), express(4-27) as
P(k + 1) = P(k) - Kw(k + l)h(k + l)P(k)
Observe that if A andB are n x n matrices, C is m x n, and D is m x m, = [I - K&k + l)h(k + l)]P(k) Cl
then to compute B from (4-23) requires the inversion of one m x m matrix.
On the other hand, to compute B from (4-22) requires the inversion of one
m X m matrix and two YIX n matrices [A- and (B-)-I. When m < n it is Comments
definitely advantageousto compute B using (4-23) insteadof (4-22). Observe,
also, that in the specialcasewhen m = 1, matrix inversion in (4-23)is replaced The recursive formula for b-, (4.24), is unchanged from (4-8). Only the
by division. matrix recursion for P, leading to gain matrix Kw has changed.A digital
computer implementation of (4.24)-(4-26) proceeds as follows:
P(k)-, K,(k + l)+ &&k + l)+ P(k + 1). This order of computa-
RECURSIVE LEAST-SQUARES: COVARIANCE FORM tions differs from the preceding one.
When z(k) is a scalar then the covariance form of the recursive LSE
Theorem 4-2. (CovarianceForm of Recursive LSE). Another recur- requires no matrix inversions, and only one division.
sive structure for &&k) is: Equations (4-24)-(4-26) can also be used for k = 0, 1, . . . , N - 1 using
i&r& + 1) the values for P- (0) and &&O) given in (4-20) and (4-21).
= i&&c) + K,(k + l)[z(k + 1) - h(k + 1)&,&k)] (4-24) The reason why the results in Theorem 4-2 are referred to asthe covar-
iance form of the recursive LSE is deferred to Lesson 9 where connec-
where tions are made between least-squaresand best linear unbiased mini-
K,(k + 1) mum-variance estimators (seep. 79).
-1
= P(k)h(k + 1) [h(k + l)P(k)h(k + 1) +
w(k + 1) 1 (4-25)
The information form is often more useful than the covariance form in (c) Show that when the measurements zU(-n), . . . , za (- 1): z (0), z( 1). . . . :
analytical studies. For example, it is used to derive the initial conditions for z (I + I) are used, then
I-r 1
Pm1(0) and &&O), which are given in (4-20) and (4-21) (see Mendel, 1973,
pp. 101~106)The information form is also to be preferred over the covariance
form during the startup of recursiveleast squares.We demonstratewhy this is
so next. -1
1
/+ 1
We consider the casewhen P=(I + 1) = X,/a+ c h(j](j]h(j)
J=o
formulas for the recursive WLSE, how they can be made independent of
LeccOn
w(k + l), when w(k + 1) = w, for allk.
4-5. For the data in the accompanying table, do the following:
(a) Obtain the least-squaresline y(t) = a + bt by meansof the batch processing
least-squaresalgorithm;
(b) Obtain the least-squares line by means of the recursive least-squares
Least-Squares
algorithm, using the recursive startup technique (let a = 10 and e = 10-16).
t Y(t)
Estimation:
0 1
1
2
5
9
Recursive Processing
3 11
(continued)
Example 5-l
In order to illustrate some of the Lesson 4 results we shall obtain a recursive algorithm
for the least-squares estimator of the scalar 8 in the instrument calibration example
(Lesson 3). Gain Kw(k + 1) is computed using (4-9) of Lesson 4, and P(k + 1) is
computed using (4-13) of Lesson 4. Generally, we do not compute P(k + 1) using
(4-13); but, the simplicity of our example allows us to use this formula to obtain a
closed-form expression for P(k + 1) in the most direct way. Recall that %e= co1
(1 1 l), which is a k x 1 vector, and h(k + 1) = 1; thus, setting W(k) = I and
& ; ;jL 1 (in order to obtain the recursive LSE of 6) in the preceding formulas, we
find that
and
i=(k + 1) = (5-3)
35
36 Least-Squares Estimation: Recursive Processing {continued) Lesson 5 Cross-Sectional Processing 37
Formula (S-3), which can be used for k = 0.1, . . . , iV - 1 by setting &CO) = 0, CROSS-SECTIONAL PROCESSING
lets us reinterpret the well-known sample mean estimator as a time-varying digital filter
[see, also, (l-2) of Lesson 11.We leave it to the reader to study the stability properties Suppose that at each sampling time tk+ 1 there are q sensors or groups of
of this first-order filter. sensors that provide our vector measurementdata. These sensors are cor-
Usually it is in only the simpiest 0: casesJhat we can obtain closed-form expres- rupted by noise that is uncorrelated from one sensor group to another. The
sions for Kw and P and subsequently &=,[or &Q]: however? we can always obtain m-dimensional vector z(k + 1) can be representedas
values for I&(k + 1) and P(k T I) at successivetime points using the results in
Theorems 4-l or 4-2. q z(k + 1) = co1(z,(k + I), z,(k + I), , , . , z,(k -t 1)) (5-j)
where
Zi(k + I) = H,(k + I)8 + vi(k I) (5-6)
GENERALIZAION TO VECTOR MEASUREMENTS
dimz,(k + 1) = mi X 1,
A vector of measurementscan occur in any application where it is possibleto 4
use more than one sensor; however, it is ako possible to obtain a vector of z Hli = f?l P-7)
i=l
measurementsfrom certain types of individual sensors.In spacecraftapplica-
tions, it is not unusual to be able to measureattitude, rate, and acceleration. E(v,(k + 1)) = 0, and
In electrical systemsapplications, it is not uncommon to be able to measure
voltages, currents, and power. Radar measurementsoften provide informa- E(v, (k + l)v,(k + I)) = Ri(k + I)&; P-8)
tion about range, azimuth, and elevation. Radar is an example of a single An alternative to processing all m measurements in one batch (i.e.,
sensorthat provides a vector of measurements. simultaneously) is available, and is one in which we freeze time at tl; +1 and
In the vector measurement case, (4-l) of Lesson 4 is changed from recursively process the 4 batches of measurementsone batch at a time. Data
z(k + I) = h(k + 110+ v(k + 1) to zl(k + I) are used to obtain an estimate [fo: notational simplicity, in this
sectionwe omit the subscript WLS or LS on 6) 8r(k + 1) with if&(k) k b(k) and
z(k + 1) = H(k + l)O + v(k + 1) z(k + 1) 4 zl(k + 1). When these calculationsare completed z?(k + I) is pro-
cessedto obtain the estimate &(k + 1). Estimate iI,(k 4 1) is used to initialize
where z is now an VI x 1 vector, H is m x PIand v is ?TIx 1.
&(k + 1). Each set of data is processedin this manner until the final set
We leave it to the reader to showthat all of the results in Lessons3 and 4
zq(k + 1) has been included. Then time is advanced to fk +2 and the cycle is
are unchanged in the vector measurementcase; but some notation must be
repeated, This type of processing is known as cross-sectional or sequential
altered (see Table 5-l).
processing. It is summarized in Figure 5-l and is contrasted with more usual
simultaneousrecursive processing in Figure 5-2.
TABLE S-1 Transformations from Scalar to Vector
Measurement Situations, and Vice-Versa
Scab Measurement Vector of Measurements
z(k + 1) z(k + l), an m X 1 vector
v(k + 1) v(k + lJ, an m X 1 vector
w(k + 1) w(k + I). an m X m matrix
h(k + 1). a 1 x FImatrix H(k + 1). an ITI x n man-ix
T(k). an A X 1 vector S(k), an Nm X 1 vecb3r
V(k), an N x 1 vector V(k), an Ah X 1 vector
W(k). an N x Nmatrix W(k)* an Nm X Nm matrix AII of this ProcessingDone at FrozenTime Point t,,,
X(k). an N x n matrix X.(k), an Nm X n matrix
Source: Reprinted from Mendel, 1973.p. 110. Courtesy of
Marcel Del&et, Inc., NY. Figure 5-I Cross-sectional processingof m measurements.
38 Least-Squares Estimation: Recursive Processing (continued) Lesson 5 Multistage Least-Squares Estimators
We extend this model to include 2additional parameters, B2,so that our model
is given by
> Time
For this model, datum {Z(k), XI(k), &(k)} is available. We wish to compute
m the least-squaresestimates of e1and e2for the yt + I parameter model using
Wk) 8(k + 1) 8(k + 2) B(k + 3) the previously computed i$&).
/ dim z Theorem 5-l. Given the linear model in (MI), where & is n x 1 and e2
W is l? x 1. The LSEs of e1 and e2 based on datum {Z(k), X,(k), X*(k)} are found
from the following equations:
/
Jdlrn 2 C(k) = [%(k)%(k) - %(k)%(k)G(k)]- (5-15)
(W The results in this theorem were worked out by Astrom (1968) and
emphasizeoperations which are performed on the vector of residuals for the
Figure 5-2 Two ways to reach 6(k + l), $k + 2), . . . . (a) Simultaneous recursive
processing performed along the line dim z = m, and (b) cross-sectional recursive n parameter model, namely Z(k) - XY,(k)6~&k). Other forms for & .&k) and
processing where, for example, at tIr+l the processing is performed along the line 6*,&k) appear in Mendel(l975).
TIME = fk+l and stops when that line intersects the line dim z = m.
Proof. The derivation of (5-12) and (5-13) is based primarily on the
The remarkable property about cross-sectionalprocessingis that block decompositon method for inverting %eX,where %e= (XII%e,).SeeMen-
de1(1975) for the details. 0
6,(k + 1) = 6(k + 1) (5-9)
where 6(k + 1) is obtained from simultaneousrecursive processing. Similar results to those in Theorem 5-l can be developed for the removal
A very large computational advantage exists for cross-sectionalpro- of 2 parameters from a model, for adding or removing parameters one at a
cessingif Rli = 1. In this case the matrix inverse [H(k + l)P(k)H(k + 1) + time, and for recursive-in-time versionsof all these results (seeMendel, 1975).
w-l (k + l)]- needed in (4-25) of Lesson 4 is replaced by the division All of these results are referred to as multistage LSEs.
[h,(k + l)Pi(k)hi(k + 1) + l/Wi(k + l)]-. See Mendel, 1973, pp. 113-118 We conclude this section and lessonwith some examples which illustrate
for a proof of (5-9) problems for which multistage algorithms can be quite useful.
Example 5-2 Identification of a SampledImpulse Response:Zoom-In Algorithm
MULTISTAGE LEAST-SQUARES ESTIMATORS A test signal, u (t) is applied at t = toto a linear, time-invariant, causal, but unknown
system whose output, y(t), and input are measured. The unknown impulse response,
Supposewe are given a linear model Z(k) = X,(k)& + T(k) with n unknown w (t), is to be identified using sampled valuesof u (t) and y (t). For such a system
parameters &, datum {S(k), X,(k)}, and the LSE of Or,6:,&k), where
y(tk) = r w(7)u(tk - 7)d7 (5-16)
@:.I&~= wxwwr %w~w (5-10) rO
Least-Squares Estimation: Recursive Processing (conGnuedJ Lesson 5
Performance
One qqxoach to identifying NJ(~)is to discretize (5-16) and to only identify w(i)
Average
11.30
(1.96
discrete values of time. If we assume that (1) w(f) * 0 for all f 2 &, (2) [rO,lW] is
(I.29
0.13
0.18
0.75
0.44
0.40
0.63
0.62
at
divided into ?I equal intervals, each of width T, so that n = (I~ - lo)/ T and (3) for
Te[fi-l, ii], w(T) = ~*(r~- I) and ti (l - 7) = u (t - fi - I), then
-- 1.07
t=l
0.22
0.23
It is straightforward to identify the n unknown parameters wI(lO), wI(fI), . . . ,
~+(i~- ,) via least squares(seeExample 2-l of Lesson 2); however, for n to be known,
I%,must be accurately known and Tmust be chosen most judiciously. In actual practice
0.30
0.35
0.35
Average Estimates of ParameterCSh
rWis not known that accuratelyso that PImay have to be varied. Multistage LSEs can be
used to handle this situation, Sometimes T can be too coarse for certain regions of
time, in which casesignificant features of W(I) such as a ripple may be obscured. In this
0.10
0.06
0.10
0.12
situation, we would like to zoom-in on those intervals of time and rediscretize y (1)
just over those intervals, thereby adding more terms to (5-17). Multistage LSEs can
also be used to handle this situation, as we demonstrate next
-0.33
-0.32
-0.29
-0.27
-0.30
For illustrative purposes, we present this procedure for ?hecase when the inter-
val of interest equals T, i.e., when t E[lx, lx + l ] = Twhich is further divided into q equal
intervals, each of width ATX, so that q = (fX+ I - &)/AT= = T/AT=. observe that
+ 1 4-3 rx + (i + l)A7x
-0.47
-0.46
-0.41
-0.44
-0.44
-0.47
w(+i& - T)dr = w(T)u(tk - r)d7
?i
j=il rx + jAT=
-0.15
-0.14
-0.15
-0.17
-0.16
-0.12
-0.15
0.43
0.43
0.42
0.40
0.43
0.43
0.44
0.40
(5-20)
0.79
0.79
0.81
0.83
0.81
0.75
0.78
0.79
0.80
0.43
0.39
0.39
0.43
0.34
0.41
0.39
0.40
0.39
0.41
Equation (5-20) contains n + q - 1 parameters.
Let
in the Model
Parameters
Number of
fll = 4331 (w(to), w(t,), . - l , w&x), . - . , w&7 - I)) (5-22)
10
1
8
9
2
3
5
6
7
n
4
and assume that a least-squaresestimate which is based on (547), is
available. Let
02 = coi[w& + AT& w2(tx + 2ATx), . . , , w& + (q - l)ATx)] (S-23)
To obtain 6= = co! (61+M,6Z,Ls)proceed as follows: 1) modify ~TJ-~by scaling &J-s(~~)
to k.~(tx )ATX/ Tand call the modified result 6*l,Ls: and 2) apply Theorem 5-I to obtain
42 Least-Squares Estimation: Recursive Processing (continued) Lesson 5
iLs. Note that the scaling of GI,Ls(fX)in Step 1 is due to the appearanceof W&&W / T
instead of IV&) in (j-20).
It is straightforward to extend the approach of this example to regions that
include more than one sampling interval, T. q
Example 5-3 Impulse ResponseIdentification
Small Sample
An n-stage batch LSE was usedto estimate impulse responsemodels having from one
to ten parameters [i.e., n in (5-17) ranged from l-101 for a second-order system
(natural frequency of 10.47 rad/sec, damping ratio of 0.12, and, unity gain). The
Properties
system was forced with an input of randomly chosen *ls each of which was equally
likely to occur. The systems output was corrupted by zero-mean pseudo-random
Gaussian white noise of unity variance. Fifty consecutive samplesof the noisy output
and noise-free input were then processedby the n-stagebatch LSE, and this procedure
of Esfimafors
was repeated ten times from which the average values for the parameter estimates
given in Table 5-2 (page 41) were obtained.
The ten models were tested using ten input sequences,each containing 50 sam-
ples of randomly chosen 51s. The last column in Table 5-2 gives values for the
normalized averageperformance
r 50 I 50 VQ
Li= 1 I i= 1 J
iNTRODUCTlON
in which L(ti) denotes an averageover the ten runs. Not too surprisingly, we see that
average predicted performance improves as n increases. How do we know whether or not the results obtained from the LSE, or for that
All the 8i results were obtained in one pass through the n-stage LSE using matter any estimator, are good? To answer this question, we make use of the
approximately 3150 flops. The same results could have been obtained using 10 LSEs
fact that all estimators represent transformations of random data. For exam-
(Lesson 3); but, that would have required approximately 5220flops. c3
ple, our LSE, [X(k)W(k)X(k)]-X(k)W(k)Z(k), represents a linear trans-
formation on Z(k). Other estimators may represent nonlinear transformations
PROBLEMS of Z(k). The consequence of this is that 6(k) is itself random. Its properties must
therefore be studied from a statistical viewpoint.
5-l. Show that, in the vector measurement case, the results given in Lessons 3 and 4 In the estimation literature, it is common to distinguish between small-
only need to be modified using the transformations listed in Table 5-l. sample and large-sample properties of estimators. The term sample refers
5-2. Prove that, using cross-sectionalprocessing, &(k + 1) = 6(k + 1). to the number of measurementsusedto obtain 6, i.e., the dimension of Z. The
5-3. Prove the multistage least-squaresestimation Theorem 5-l. phrase small-sample means any number of measurements (e.g., 1, 2, 100,
5-4. Extend Example 5-2 to regions of interest equal to mT, where m is a positive
104,or even an infinite number), whereasthe phrase large-sample meansan
integer and T is the original data sampling time. infinite number of measurements. Large-sample properties are also referred to
as asymptotic properties. It should be obvious that if an estimator possesses a
small-sample property it also possesses the associated large-sample property;
but, the converse is not always true.
Why bother studying large-sampleproperties of estimatorsif theseprop-
erties are included in their small-sampleproperties? Put another way, why not
just study small-sample properties of estimators? For many estimators it is
relatively easy to study their large-sampleproperties and virtually impossible
to learn about their small-sampleproperties. An analogoussituation occurs in
stability theory, where most effort is directed at infinite-time stability behavior
rather than at finite-time behavior.
Small Sample Propertks of Estimators Lesson 6 Unbiasedness
Although large-sample means an infinite number of measurements, Many estimators are linear transformations of the measurements, i.e.,
estimators begin to enjoy their large-sampleproperties for much fewer than an i(k) = F(k)%.(k) (6-6)
infinite number of measurements-How few, usually depends on the dimen-
sion of 0, n. In least-squares, we obtained this linear structure for i(k) by solving an
A thorpugh study into 6 would mean determining*its probability density optimization problem. Sometimes, we begin by assuming that (6-6) is the
function p (9). WsuaUy,it is too difficult to obtain p (0) for most estimators desired structure for 8(k). We now address the question when is F(k)Z(k) an
(unless4 is multivariate Gaussian);thus, it is customary to emphasizethe first- unbiased estimator of deterministic 6?
and second-order statistics of 6 (or, its associatederror 6 = 8 - 6), namely the
mean and covariance. Theorem 6-f. When 55(k) = X(k)8 + V(k), E(Sr(k)) = 0, and X(k) is
We shall examine the following smaK and large-sample properties of deterministic, then b(k) = F(k)%(k) [where F(k) is deterministic] is an unbiased
estimators: unbiasednessand efficiency (small-sample)Yand asymptotic un- estimator of 0 if and only if
biasedness, consistency, and asymptotic efficiency (large-sample). Small- F(k)X(k) = I for ali k (6-7)
sampleproperties are the subject of this lessonTwhereaslarge-sampleproper-
ties are studied in Lesson 7+ Note that this is the first place where we have had to assumeany a priori
knowledge about noise V(k).
Pruof
UNBIASEDNESS a. (Necessity). From the model for S(k) and the assumed structure for
i(k), we see that
6(k) = F(k)%e(k)B + F(k)V(k) (6-S)
If h(k) is an unbiased estimator of 8, F(k) and X(k) are deterministic, and
E(V(k)) = 0, then
E{&k)} = 9 = F(k)X(k)B
[I - F(k)X(k)j9 = 0 W-9
In terms of estimation error, 6(k), unbiasednessmeans,that
Obviously, for 8 # 0, (6-7) is the solution to this equation.
E@(k)] = 0 for ai1k b. (Suficiency). From (6-S) and the nonrandomness of F(k) and X(k),
Exampk 6- 1 we have
In the instrument calibration example of Lesson3, we determined the following LSE of E{&k)} = F(k)X(k)O (6-10)
8:
Assuming the truth of (6-7), it must be that
E{d(k)) = 9
where which, of course, means that 6(k) is an unbiased estimator of 9. 0
z(i) = e + v(i) Example 6-2
SupposeI~{Y(i)} = 0 for L= 1,2, , , . , N; then, Matrix F(k) for the WLSE of 8 is [%(k)W(k)X(k)]-3t?(k)W(k). Observe that this
F(k) matrix satisfies (6-7); thus, when X(k) is deterministic the WLSE of9 is unbiased.
Unfortunately, in many interesting applications X(k) is random, and we cannot
apply Theorem 6-l to study the unbiasednessof the WLSE. We return to this issuein
which means that &(A$! is an unbiased estimator of 0. R Lesson 8. Cl
Efficiency 47
Small Sample Properties of Estimators Lesson 6
Supposethat we begin by assuming a Iinear recursive structure for 6, 8(k) with the smallest error variance that can ever be attained by any unbiased
namely estimator. The following theorem provides a lower bound for E{ h2(k)} when 8
is a scalar deterministic parameter. Theorem 6-4 generalizes these results to
b(k + 1) = A(k + 1)6(k) t b(k + l)z(k + 1) (6-11) the case of a vector of deterministic parameters.
We then have the following counterpart to Theorem 6-l.
Theorem 6-3 (Cramer-Rao Inequality). Let z denote a set of data
Theorem 6-2. When z(k + 1) = h(k + 1)0 + v(k + l), E{v(k + 1)) = [ i.e., z = co1 (Zi, z2, . . . , zk); z is also short for Z(k)]. If i(k) is an unbiased
0, and h(k + 1) is deterministic, then &k + 1) given by (6-11) is an unbiased estimator of deterministic 8, then
estimator of 8 if
E{@(k)} 2 (6-15)
A(k + 1) = I - b(k + l)h(k + 1) (6-12)
E{[$Q:p(z)r} forazzk
where A(k + 1) and b(k + 1) are deterministic. 0
Two other ways for exPressing (6-15) are
We leave the proof of this result to the reader. Unbiasednessmeans that
our recursive estimator does not have two independent design matrices (de- E{@(k)} > for all k (6-16)
greesof freedom), A(k + 1) and b(k + 1). UnbiasednessconstraintsA(k + 1) Ldz
to be a function of b(k + 1). When (6-12) is substitutedinto (6-H), we obtain PCZ >
the following important structure for an unbiasedlinear recursive estimator
of&
E{p(k)} 2 for all k (6-17)
6(k + 1) = i(k) + b(k + l)[z(k + 1) - h(k + 1)&k)] (6-13) -E i; (,)
Our recursive WLSE of 8 has this structure; thus, as long as h(k + 1) is de2
deterministic, it producesunbiasedestimatesof 8. Many other estimators that where dz is short for dq, dt2, . . . , dzk . cl
we shall study will also have this structure.
Inequalities (6-19, (6-16) and (6-17) are named after Cramer and Rao,
who discovered them. They are functions of k because z is.
Before proving this theorem it is instructive to illustrate its use by means
of an example.
Did you hear the story about the conventioning statisticianswho all drowned
in a lake that was on the average 6 in. deep? The point of this rhetorical Example 6-3
question is that unbiasednessby itself is not terribly meaningful. We must also We are given M statistically independent observations of a random variable z that is
study the dispersionabout the mean, namely the variance. If the statisticians known to have a Cauchy distribution, i.e.,
had known that the variance about the 6 in. averagedepth was 120 ft, they
1
might not havedrowned! p(a) = (6-18)
7T[l + (Zi - e)]
Ideally, we would like our estimator to be unbiased and to have the
smallest possible error variance. We consider the caseof a scalar parameter Parameter 6 is unknown and will be estimated using zl, z2, . . . , ZM. We shall determine
first. the lower bound for the error-variance of any unbiased estimator of 8 using (6-15).
Observe that we are able to do this without having to specify an estimator structure for
Definition 6-2. An unbiased estimator, i(k) of 0 is said to be more 6. Without further explanation, we calculate:
eficient than any other unbiased estimator, 8(k), of 9, if
p(Z) = fi p (Zi) = 1 fl fi [l + (Zi - e)2] (6-19)
Var (6(k)) 5 Var (8(k)) for all k 0 (6-14) i==l i( i=l
Very often, it is of interest to know if 6(k) satisfies(6-14) for all other lnP(z) = -Mlnrr - 2 ln[l + (Zi - Ql (6-20)
;=1
unbiased estimators, i(k). This can be verified by comparing the variance of
48 Smail Sample Properties of Estimators Lesson 6 Efficiency 49
and thus, when (6-23) is substituted into (622), and that result is substituted into (615), we
find that
(6-21)
for all M (6-33)
so that
Observe th at the Cramer-Rao bound depends on the number of measurements
used to estimate 8. For large numbers of measurements,this bound equals zero. Cl
Next, we must evaluate the right-hand side of (6-22). This is tedious to do, but Proof uf Theorem 6-3. Because b(k) is an unbiased estimator of t?,
can be accomplished,as follows+Xote that
E{@k)} = /= [i(k) - 01 p(z)dz = 0 (4-34)
-m
Differentiating (6-34) with respect to 8, we find that
r
Consider TA first, i.e.,
&k) - $1 %p dz - f p(z) dz = 0
J --a -cc
TA = E : 2(zj - @/[l + (zj - O)*] 5 Z(Zj - 8)/[1 + (pi - 0)2] (6-24)
i=l j-1 which can be rewritten as
where use of statistical independence of the measurements. observe
that (6-35)
where y = zi - 8. The integral is zero because the integrand is an odd function of y. (6-36)
Consequently,
TA = 0 (6-26) so that
Next, consider TEL,i.e.,
I (6-37)
TB = E 5 4(Zi - 0)2/[1 + (pi - tQ212 (6-27)
Substitute (6-37) into (6-35) to obtain
which can also be written as
TB = 4 2 TC (6-23)
i-1
where
TC = E{(Zi - q2/[l + (Zj - ej2J2) (6-29)
Or
Finally, to obtain (6-Q) solve (6-40) for E{@(k)}. For a vector of parameters, we seethat a more efficient estimator hasthe
In order to obtain (6-16) from (6-H), observethat smallesterror covariance among all unbiasedestimators ot8, smallest in the
sense that E{[8 - b(k)][8 - 6(k)]} - E{[6 - e(k)][e - e(k)]} is negative
semi-definite.
The generalization of the Cramer-Rao inequality to a vector of par;lm-
eters is given next.
To obtain (6-41) we have also used (6-36). Theorem 6-4 (Cramer-Rao inequality for a vector of parameters).
In order to obtain (6-17), we begin with the identitvw Let z denote a set of data as in Theorem 6-3, and 8(k) be any unbiased estimator
x of deterministic 8 based on z. Then
p(z)dz= 1
I -X E{ij(k)h(k)} 2 J- for all k (6-45)
and differentiate it twice with respect to 0, using (6-37) after each differ- where J is the Fisher information matrix,
entiation, to show that
J = E [$np(z)][-$ (6-46)
i
which can also be expressed as
which can also be expressedas d2
J = -E ,zlnp(z) (6-47)
E[ [-$lnp(z)r] = -E[ n!/} (6-42) i I
Equality holds in (6-45), if and only if
Substitute (6-42) into (6-15) to obtain (6-17). Cl
[$ ln P(Z)] = c(e)ii(k) 0 (6-48)
It is sometimeseasierto compute the Cramer-Rao bound using one form
1i.e., (6-15) or (6-16) or (6-17)] than another. The logarithmic forms are A complete proof of this result is given by Sorenson (1980, pp. 94-96).
usually used when p (z) is exponential (e.g., Gaussian). Although the proof is similar to our proof of Theorem 6-3, it is a bit more
CoroNary 64. If the lower bound is achieved in Theorm 6-3, then intricate becauseof the vector nature of 8.
Inequality (6-45) demonstrates that any unbiased estimator can have a
1 d lnp(z) covariance no smaller than J-. Unfortunately, J-l is not a greatest lower
8(k) =--c de (6-43)
bound for the error covariance. Other bounds exist which are tighter than
where c is an arbitrary constant. (6-45) [e.g., the Bhattacharyya bound (Van Trees, 1968)], but they are even
more difficult to compute than J?
Proof. In deriving the Cramer-Rao bound we used the Schwarz in-
equality (6-39) for which equality is achievedwhen b (z) = c ti (z). In our case Corollary 6-2. Let z denote a set of data as in Theorem 6-3, and &(kj be
a(z) = [6(k) - tFJm and b(z) = m (~?/a@lnp(z)- Setting b(z) = c any unbiased estimator of deterministic 0i based on z. Then
a(z), we obtain (6-43). Cl
E{@(k)} 2 (J-)ii i = 1,2, . . . ,n and all k (6-49)
Equation (6-43) links the structure of an estimator tc the property of where (J-)ii is the i-ith element in matrix J-.
efficiency, becausethe left-hand side of (6-43) depends eAxpli,itly on e(k).
We turn next to the general caseof a vector of paramzxrrs. Proof. Inequality (6-45) means that E{@k)&(k)} - J-l is a positive
semi-definite matrix, i.e.,
Definition 6-3. An unbiased estimator, i(k), of vecmy 8 is said to be a[E{b(k)b(k)} - J-]a 2 0 (6-50)
more efficient than any other unbiased estimator, 8(k), of&
s i-s where a is an arbitrary nonzero vector. Choosinga = ei (the ith unit vector) we
E{[B - i(k)][O - i(k)]} 5 E{[6 - $k)][e - & W-- - z] (6-9 obtain (6-49). 0
Small Sample Propedes of Estimators Lesson 6 Lesson 6 Problems 53
Results similar tu those in Theorem 6-4 and Corollary 6-2 are also 6-9. Suppose 6(k) is a biased estimator of deterministic 8, with bias B( IT?).
Show that
available for a vector of random parameters (e.g., Sorenson, 1980, pp.
99-100). Let p (z, 9) denote the joint probability density function between z
and 9. The Cramer-Rao inequality for random parameters is obtained from [I +T12
Theorems 6-3 and 6-4 by replacingp (z) by p (2, 9). Uf course,the expectation E{$(k)} 2
is now with respectto z and t3+ E{ [-$ In p(z)]) for a11k
PROBLEMS
6-l* Prove Theorem 6-2, which provides an unbiasedness constraint for the two
design matrices that appear in a linear recur.siveestimator.
6-2* Prove Theorem 6-4, which provides the Cramer-Rao bound for a vector of
parameters,
6-3. Random variabIe X - N(x; P7 4, and we are given a random sample
{Xl, 12, * * , x,~}. Consider the following estimator for by
and
S2
7
Asymptotic Distributions
lesson be degenerate, but the form that the distribution tends to put on in the last
part of its journey to the final collapse(if this occurs). Consider the situation
depicted in Figure 7-1, where pi( 6) denotes the probability density function
associated with estimator 6 of the scalar parameter 8, based on i mea-
Large Sample surements.As the number of measurementsincreases,pi (6) changesits shape
(although, in this example, eachone of the density functions is Gaussian).The
density function eventually centers itself about the true parameter value 0,
Properties of and the variance associatedwith pi( 6) tends to get smaller as i increases.
Ultimately, the variance will become so small that in all probability 8 = 6.
The asymptotic distribution refers to pi (6) as it evolvesfrom i = 1,2, . . . , etc.,
Estimators especiallyfor large values of i.
The preceding example illustrates one of the three possible casesthat
can occur for an asymptotic distribution, namely the casewhen an estimator
has a distribution of the sameform regardlessof the sample size, and this form
is known (e.g., Gaussian). Someestimators have a distribution that, although
not necessarilyalways of the sameform, is also known for every samplesize.
For example, p5(8) may be uniform, pZO( 8) may be Rayleigh, andpzoo(6) may
be Gaussian. Finally, for some estimators the distribution is not necessarily
known for every sample size, but is known only for k + 30.
Asymptotic distributions, like other distributions, are characterizedby
their moments. We are especially interested in their first two moments,
INTRODUCTION
namely, the asymptotic mean and variance.
To begin, we reiterate the fact that if an estimator possessesa small-sample
Definition 7-l. The asymptotic mean is equal to the asymptotic expecta-
property it also possessesthe associatedlarge-sampleproperty; but, the con-
tion, namely t@mE{i(k)}. IJ
verse is not always true. In this lessonwe shall examine the following large-
sample properties of estimators: asymptotic unbiasedness,consistency, and
asymptotic efficiency. The first and third properties are natural extensionsof
the small-sample properties of unbiasednessand efficiency in the limiting
situation of an infinite number of measurements. The second property is
about convergenceof B(k) to 6.
Before embarking on a discussionof these three large-sampleproper-
ties, we digressa bit to introducethe concept of asymptotic distribution and its
associatedasymptotic mean and variance (or covariance, in the vector situ-
ation). Doing this will help us better understand these large-sampleproper-
ties.
ASYMPTOTIC DISTRIBUTIONS
asymptotic distribution of the estimator in question. . . . What is meant by the Figure 7-1 Probability density function for estimate of scalar 8, as a function of number of
asymptotic distribution is not the ultimate form of the distribution, which may measurements, e.g., pm is the p.d.f. for 20 measurements.
Large Sample Properties of Estimators Lesson 7 Consistency 57
As noted in Goldberger (1944, pg. 116), if E{@$ = m for al/ k, then ASYMPTOTIC UNBIASEDNESS
p?m E{t!(k)j = linn m = RL Alternatively, supposethat
Definition 7-3. Estimator 6(k) is an asymptotically unbiased estimator
E{@k)} = m + k-cI + k -c2 + l l 9 V-1) of deterministic 8, if
where the cs are finite constants;then, ,!@=E&k)) = 9 (7-6)
or of random 9, if
p9z E{i(k)] = 1$X{m + k-q + ke2c2 + . . l} = m P-9
dinn E&k)) = E(9) 0 P-7)
Thus if E{@c)} is expressible as a power series in k, k-l, k-, . . - , the
asymptotic mean of 6(k) is the leading term of this power series; ask + zcthe Figure 7-l depicts an example in which the asymptotic mean of 6 has
terms of higher order of smallness in k vanish. convergedto 8 (note that pzoois centered about mm = 0).
Note that for the calculation on the left-haqd side of either (7-6) or (7-7)
to be performed, the asymptotic distribution of B(k) must exist, because
The usymptutic variurxe, wh@ is short fur variance of
Definition 7-Z.
the asymptotic distribution is not equal to lirnzvar [@)]. It is defined as
E{ii(k)) = J= - - .I= l!qk)p(Cqk))&(k)
-70 --m
U-8)
asymptotic var [8(k)] = Lk lee E{k[&k) - limm@(k)}]j II (7-3) Example 7-1
RecalI our linear model S.(k) = Z(k)0 + T(k) in which E(Sr(k)} = 0. Let us assume
Kmenta (1971, pg. 164) statesThe asymptotic variance . . . is not equal that each component of Y(k) is uncorrelated and has the same variance d. In Lesson
to j@= var(@. The reason is that in the case of estimators whose variance 8, we determine an unbiased estimator for CT:.Here, on the other hand, we just assume
decreaseswith an increasein k, the variancewill approach zero as k + m. This that
will happen when the distribution collapseson a point. But, as we explained,
the asymptotic distribution is not the same as the collapsed (degenerate)
distribution, and its variance is WCzero.
where 2(k) = Z(k) - X(k)&,(k). We leave it to the reader to show that (seeLesson 8)
Goldberger (1944, pg. 116) notes that if E{[@k) - lim E{&k)}]} = u /k
for aZI values of k, then asymptotic var [6(k)] = v /k. A&Fnatively, suppose
that
E{z (k)}= (q) CT?
E{[&k) - j@= E{i(k)}12j = k--Iv + k-2cI + k-cj -+ 0. Observe that I$?(k) is not an unbiased estimator of d; but, it is an asymptotically
unbiased estimator, because ,II- [(k - n)/k] & = CJ-?. q
l
k goesto infinity the terms of higher order of smallnessin k vanish. Observe Definition 7-4. Theprobability limit of 6(k) is the point 0* on which the
that if (7-5) is true then the asymptotic variance of 6(k) decreases as distribution of our estimator collapses. We abbreviate probability limit of
k+m.. . which correspondsto the situation depicted in Figure 7-l. 6(k) by plim 6(k). Mathematically speaking,
Extensions of our definitions of asymptotic mean and variance to se- (7-11)
plim 6(k) = 8* -i& Pr [Ii(k) - 8*1 1 e]-,O
quencesof random vectors [e.g.! 6(k), k = lq 2? . . .] are straightforward and
can be found in Goldberger (1964, pg. 117). where E is a small positive number. El
Large Sample Properties of Estimators Lesson 7 Consistency
Definition 7-5. 6(k) is a consistent estimator of 8 if do not know this to be true, then there is no guarantee that 8 = (6). In
plim i(k) = 8 0 (7-12) Lesson 11 we show that maximum-likelihood estimators are consistent; thus
ML= (&L)2. we mention this property about maximum-likelihood esti-
Note that consistency meansthe samething asconvergencein proba- mators here, becauseone must know whether or not an estimator is consistent
bility. For an estimator to be consistent, its probability limit 8* must equal its before applying the consistency carry-over property. Not all estimators are
true value 0. Note, also, that a consistent estimator need not be unbiased or consistent!
asymptotically unbiased. Finally, this carry-over property for consistency does not necessarily
Why is convergencein probability so popular and widely used in the apply to other properties. For example, if b(k) is an unbiased estimator of 0,
estimation field? One reason is that plim ( 0) can be treated as an operator. then A&) + b will be an unbiased estimator of A8 + b; but, 8(k) will not be
For example, supposeXk and Yk are two random sequences,for which plim an unbiased estimator of 82.
xk = X and plim Yk = Y, then (see Tucker, 1962,for simple proofs of these How do you determine whether or not an estimator is consistent?Often,
and other facts), the direct approach, which makes heavy use of plim ( 0) operator algebra, is
ph &Yk = (plim xk)(plim Yk) = fl (7-13) possible. Sometimes, an indirect approach is used, one which examines
whether both the bias in i(k) and variance of t?(k) approachzero as k * 0~).In
and order to understand the validity of this indirect approach, we digressto discuss
=- plim xk =-x (7-14)
mean-squaredconvergenceand its relationship to convergencein probability.
plim Yk Y
Definition 7-6. Estimator 6 (k) convergesto 8 in a mean-squaredsense,
Additionally, supposeAk and Bk are two commensuratematrix sequences,for
which plim Ak = A and plim Bk = B [note that plim Ak, for example, means
if
CnlE{[&k) - 6]}-+0 q (7-18)
(plim aii(k))ii] ; then
plim AkBk = (plim Ak)(Plirn Bk) = AB (7-15) Theorem 7-1. If i(k) convergesto 8 in mean-square,then it convergesto
8 in probability.
plim Ai1 = (plim AJ = A-l (7-16)
Proof (Papoulis, 1965,pg. 151). Recall the Inequality of Bienayme
WI x- al 2 E] 5 E{lx - a12}/8 (7-19)
plim Ai1 Bk = A-B (7-17)
Let a =Oandx = 6(k) - 6 in (7-19), and take the limit ask + 00on both sides
The treatment of plim ( .) as an operator often makes the study of of (7.19), to seethat
consistency quite easy. We shall demonstrate the truth of this in Lesson 8 hrl Pr[@(k) - 612 E] 5 llm E{@(k) - S]*}/E? (7-20)
when we examine the consistencyof the least-squaresestimator.
A second reason for the importance of consistencyis the property that Using the fact that 6(k) convergesto 6 in mean-square,we seethat
consistency carries over; i.e., any continuous function of a consistent esti- liir Pr[li(k) - 61=>E]-+O (7-21)
mator is itself a consistent estimator [see Tucker, 1967, for a proof of this
property, which relies heavily on the preceding treatment of plim ( ) as an l
thus, 6(k) convergesto 6 in probability. cl
operator]. Recall, from probability theory, that although mean-squared con-
Example 7-2 vergence implies convergencein probability, the converseis not true.
Suppose6 is a consistent estimator of 8. Then 1/ 6 is a consistentestimator of l/ 8, (6) Example 7-3 (Kmenta, 1971, pg. 166)
is a consistent estimator of 8, and In 6 is a consistent estimator of in 8. These facts are Let 6(k) be an estimatorof 8, and let the probability density function of 6(k) be
all due to the consistencycarry-over property. [7
The reader may be scratchinghis or her headat this point and wondering 1
about the emphasisplaced on these illustrative examples.Isnt, for example, 8 1-z
using 6 to estimate 8?by (6) the natural thing to do? The answer is Yes, 1
but only if we know aheadof time that 6 is a consistentestimator of 8 . If you k
k
Large Sample Propertks of Estimators Lesson 7 Lesson 7 Problems 61
In this example t?(k) can only assumetwo different values, 6 and k. Obviously, 6 (k) is PROBLEMS
consistent, becauseas k -+CCthe probability that 6(k) equals 8 approaches unity, i.e.,
plim 6(k) = 0. Observe, also, that E{i(k)] = 1 + G(l - l/k), which means that 6(k) is 7-l. Random variable X - N(x ; p, d), and we are given a random sample (x,,
biasede x2, * * *, xH). Consider the following estimator for p,
Now let us investigate the mean-squared error between i and 0; i.e.,
where a 2 0.
(a) For what value(s) of a is r;(N) an asymptotically unbiased estimator of p?
(b) Prove that F;(N) is a consistent estimator of ~1for all a 2 0.
In this pathological example, the mean-squared error is diverging to infinity; but, 6(k) (c) Compare the results obtained in (a) with those obtained in Problem 6-3.
converges to 0 in probability. 0 7-2. Suppose zl, z2, , . . : zN are random samples from a Gaussian distribution with
unknown mean, p, and variance, 2. Reasonable estimators of p and 2 are the
Theorem 7-L Let 6(k) denote an estimator uf 0. If bias 6(k) und vari- sample mean and sample variance
ance 6(k) both approach zero as k* x, then the mean-squared error between
6(k) and 8 appruuches zero, and, therefore, 6(k) i.sa cunsistent estimatur of 0.
Proc$ From elementary probability theory, we know that
If, as assumed,bias I@) and variance I both approachzero as k + =, then (a) Is sz an asymptotically unbiased estimator of d? [Hint: Show that
E(s*} = (N - 1)&N].
(7-23) (b) Compare the result in (a) with that from Problem 6-4.
(c) One can show that the variance of s2 is
which means that 6(k) convergesto 0 in mean-square.Thus, by Theorem 7-l
- u 2(p4 - 2$) - 3u4
var(s2) =- ct4
+ p4
@c) also convergesto 8 in probability. III -
N N2 N3
The importance of Theorem 7-2 is that it provides a constructive way to Explain whether or not s* is a consistent estimator of 02.
test for consistency. 7-3. Random variable X - N(x; p ,g). Consider the following estimator of the
population mean obtained from a random sample of N observations of X,
/.i(N)=x +;
ASYMPTOTiC EFFICIENCY where a is a finite constant, and f is the sample mean.
(a) What are the asymptotic mean and variance of @(IV)?
Definition 7-7. i(k) is un asymptoticaily efficient estimator of scalar (b) Is r;(N) a consistent estimator of p?
parameter 0, ifi I. 6(k) has an asymptotic distribution with finite mean and (c) Is I;(N) asymptotically efficient?
variance, 2. 6(k) ti cunsistent, and 3. the variance uf the asymptotic distribu- 7-4. Let Xbe a Gaussianvariable with mean p and variance 2. Consider the problem
tion equals of estimating p from a random sample of observations xl, x2, . . . , xN. Three
estimators are proposed:
Estimator
Properties
Properties 61 fi2 c;3
Small sample
Unbiasedness
Efficiency
of L eust-Squares
Large sample
Un biasedness
Consistency
Efficiency
Estimators
Entries to this table are yes or no.
7-S. If plim Xk = X and plim Yk = Y where X and Y are constants, then plim
(Xk + Yk) = X + YandplimcXk = cX, where c is a constant. Prove that plim
XkYk = XY{Hint: XkYk = a [(Xk + Yk)2 - (Xk - yk )2])*
In this section (parts of which are taken from Mendel, 1973, pp. 75-86) we
examine the bias and variance of weighted least-squaresand least-squares
estimators.
To begin, we recall Example 6-2, in which we showed that, when X(k) is
deterministic, the WLSE of 0 is unbiased. We also showed [after the statement
of Theorem 6-2 and Equation (6-13)] that our recursive WLSE of 8 has the
requisite structure of an unbiased estimator; but, that unbiasednessof the
recursive WLSE of 8 also requires h(k + 1) to be deterministic.
When X(k) is random, we have the following important result:
Theorem 8-l. The WLSE of 8,
6,(k) = [X(k)W(k)%e(k)]- X(k)W(k)%(k) (8-1)
Properties of Least-Squares Estimators Smalt Sample Properties of Least-Squares Estimators 65
Lesson 8
Z:(k + 1) W)
more difficult to verify aheadof time than independence,especiallysincern: is
a very nonlinear transformation of the random elements of X. and, to obtain dts. In order to study the bias of ciLsI we use (8-2) in which W(k) is set
equal to I, and X(k) and T(k) are defined in (8-11). We also set 6,(k) = CiLS(k + 1).
Example 8-l The argument of ii u is k 4 1 instead of k, becausethe argument of 5 in (S-11)is k + 1.
Recall the imp&e response identification Example 2-1, in which 8 = co1\h (1), h (2), Doing this, we find that
. . . , h(n)], where h (i) is the value of the sampled impulse responseat time li. System
input u(k) may be deterministic or random. Ii WvG>
&(k+l)=a-O, (S-12)
If{U(k),k =o, 1, .*., *N - l} is deterministic, then X(!V - 1) isee Equation
(Z-5)] is deterministic, so that 0WU(AJ)is an unbiased estimator of 0. Often, one usesa c Y(i)
random input sequencefor {u(k), k = U: I, . . . , A - I}, such as from a random 1-0
66 Properties of Least-Squares Estimators Lesson 8 Small Sample Properties of Least-Squares Estimators
thus, where we have made use of the fact that W is a symmetric matrix and the
transpose and inverse symbols may be permuted. From probability theory
, (8-13) (e.g., Papoulis, 1965), recall that
E{&(k + 1)) = a -
Ex,v 1 l I = Ex mvpc 1 l I%>
(8-22)
Note, from @-lo), that y(j) dependsat most on u (j - 1); therefore, Applying (8-22) to (8-21), we obtain (8-18). ci
where Ik is the k X k identity matrix. Let tions under which the LSE of 8.8:,(k). is the same as the maximum-1ikeIihood
M = Ik - qx%y X (s-27) estimator of 8, &&k). Because 8,,(k) is consistent, asymptotically efficient,
Matrix M is idempotent, i.e., M = M and Mz = M; therefore, and asymptotically Gaussian, b&k) inherits all these properties.
Theorem 8-4. If
(8-28)
plim [X(X-)X(k)/k] = CX (8-33)
Recall the following well-known facts about the trace of a matrix:
f-s1 exists, and
1. E{tr A} = tr E{A}
2, tr CA = c tr A, where c is a scalar plim [W(k)Sr(k)/k] = 0 (8-34)
3. tr (A + B) = tr A + tr B then
4. t&l = Iv plim Qk) =6 (8-35)
5. tr AB = tr BA
Using thesefacts, we now continue the developmentof (g-28), as follows: Note that the probability limit of a matrix equalsa matrix each of whose
elements is the probability limit of the respectivematrix element. Assumption
E{fk%} = tr [ME{VVl] = tr M% = tr MCT~ (8-33) postulates the existence of a probability limit for the second-order
= d tr M = 0: tr [Ik - %Y(WX)-%T] moments of the variables in X(k), as given by Z&. Assumption (8-34) postu-
lates a zero probability limit for the correlation between X(k) and T(k).
= &k - & tr X(XX)- X (8-29) %e(k)Sr(k)can be thought of as a filtered version of noise vector V(k). For
= uz k - CT:tr (XX)(XX)-l (8-34) to be true filter X(k) must be stable. If, for example, X(k) is
= dk - &trIn = uz(k -n) deterministic and CT&) < a, then (S-34) will be true.
Solving this equation for d, we find that Proof. Beginning with (8-2), but for i&k) instead of &,&k), we see
that
(8-30) b,,(k) = 8 + (me-XT (8-36)
Although this is an exact result for uz, it is not one that can be evaluated, Operating on both sides of this equation with plim, and using properties
becausewe cannotcompute E{&$. (7-19, (7-16): and (7-17), we find that
Using the structure of d as a starting point, we estimate & by the simple
formula plim i,(k) = 9 + plim [(XX/k)- (WV/k)]
&; (k) = t!t(k)~(k)/(k - II) (8-31)
= 9 + plim (RX/k)-plim (WV/k)
= 9 + & - 0
To show that x (k) is an unbiased estimator of c$, we obsewe that
= 9
(8-32) which demonstrates that, under the given conditions, &(k) is a consistent
estimator of 0. q
where we have used (8-29) for E{~%]. III
In some important applications Eq. (8-34) doesnot apply, e.g., Example
8-2. Theorem S-4 then does not apply, and, the study of consistency is often
LARGE SAMPLE PROPERTIES OF LEAST-SQUARES ESTIMATORS quite complicated in these cases.
Many large sampleproperties of LSEs are determined by establishing that the
LSE is equivalent to another estimator for which it is known that the large Theorem 8-5. If (8-33) and (8-34) are me, C, exists, and
sampIeproperty holds true. In Lesson 11, for example,we will provide condi- plim [gr(k)Sr(k)/k] = IT: (S-37)
Properties of Least-Squares Estimators Lesson 8
Lesson 9
plim 3 (k) = d (8-38)
where 8 (k) is given by (8-24).
Proof. From (8-26),we find that
Best Linear Unbiased
Consequently,
fzi% = VT - Olc3q~~)-%e~ (8-39)
Estimation
plim 8 (k) = plim %t%/(k - n)
= plim VV/(k - n) - plim ~X(XX)-lXV/(k - n)
= d - plim vXl(k - n) . plim [X?tT/(k - n)]-1
. plim %V/(k - n)
INTRODUCTION
PROBLEMS
Least-squaresestimation, as described in Lessons 3, 4 and 5, is for the linear
8-1. Suppose that iLs is an unbiased estimator of 0. Is & an unbiased estimator of model
e? (Hint: Use the least-squaresbatch algorithm to study this question.)
3!(k) = X(k)0 + V(k) (9-l)
8-2. For 6WLs(Q to be an unbiased estimator of 8 we required ET(k)] = 0. This
problem considers the casewhen ET(k)} f 0. where 9 is a deterministic, but unknown vector of parameters, X(k) can be
(a) Assume that E(gr(k)} = V o where Sr, is known to us. How is the deterministic or random, and we do not know anything about V(k) ahead of
concatenated measurementequation Z(k) = %e(k)e+ V(k) modified in this time. By minimizing &(k)W(k)?%I(k), where Z%(k) = Z(k) - X(k)&&k), we
case so we can use the results derived in this lesson to obtain bw,,(k) or determined that 6-(k) is a linear transformation of Z(k), i.e., &&k) =
ii,(k)? F,&k)%(k). After establishingthe structure of &&k), we studied its small-
(b) Assume that E(Sr(k)} = m%Tlwhere m6tpis constant but is unknown. How is and large-sample properties. Unfortunately, 6-(k) is not alwaysunbiased or
the concatenated measurementequation S!(k) = X(k)8 + V(k) modified in
efficient. These properties were not built into 6-(k) during its design.
this case so that we can obtain least-squaresestimatesof both 8 and my?
In this lesson we develop our second estimator. It will be both unbiased
8-3. Consider the stable autoregressive model y(k) = &y(k - 1) + . . + &y l
where &, is the Kronecker 6 (i.e., &, = 0 for k # j and 6kJ= I for k = j) thus, where ei is the ith unit vector,
3(k) = E{3-(A-)=V-(k)} e; = co1 (0, 0, * . . ,o, l,O, . . . 70) (9-10)
= diag [d(k)- d(k - l), . . . , &(I? - IV + 1)] P-9
in which the nonzero element occurs in the ith position. Equating respective
In the case of vector measurements,z(k), this meansthat vector noise, v(k), is elements on both sides of (9-9), we find that
white, i.e.,
X(k)f,(k) = ei i = 1,2, . . ..n (9-11)
E{v( k)v( j)l = R(k)Skj P-4)
Our single unbiasedness constraint on matrix F(k) is now a set of n constraints
thus, on the rows of F(k).
3(k) = diag [R(k), R(k - l), . . . ! R(k - N + 1)] Next, we express E([& - 6i.BLU(k)]2>in terms of f, (i = 1,2, . . . , IV). We
shall make use of (9-ll), (9-l), and the following equivalent representation of
(9-6)
PROBLEM STATEMENT AND OWECTfVE FUNCTION &HlJ(~) = f:(k)%(k) i = 1,2,. . , ,n (9-12)
Proceeding, we find that
We begin by assumingthe following linear structure for &&k),
E{[& - tir,BLU(k)]2} = E(( ei - fi%)*} = E{( Oi - SYf,)}
k,(k) = F(QW) WI
= E{aj! - 2&%fi + (EfJ*}
where, for notational simplicity, we have omitted subscriptingF(k) as FBLU(k).
= E{@ - 28,(%33+ T)f, + [(%I + T)fi]l)
We shall design F(k) suchthat (9-13)
= E{$ - 2Oi9Xfi - ZOiTfi + [6Xf; + Vff])
a, &&k) is an unbiasedestimator of 0, and = E(B; - 20i8ei - 2OiVfi + [Bej + Vfj])
b. the error variance for eachone of the n parametersis minimized- In this = E{filTfi) = fi9tf;
way, &&k) wiZ1be unbiased afzd eficierzt, by design.
Observe that the error-variance for the ith parameter dependsonly on the ith
Recall, from Theorem 6-1, that mbiasedness cmzs~raim design mutrix row of design matrix F(k). We, therefore, establish the following objective
F(k), such Ihut function:.
DERIVATION OF ESTIMATOR
F(k) =
A necessary condition for minimizing Ji(fi ,Ai) is 81,(f;,Ai)/ df; = 0 (i =
1,2, ,..) n); hence,
Equation (9-7) can now be expressedin terms of the vector components of 29ifi + XAi = 0 (9-E)
F(k). For our purposes, it iseasier to work with the transpose of (9-7),
XF = 1, which can be expressedas from which we determine fj, as
fl = -;s-X& (9-16)
74 Best Linear Unbiased Estimation Lesson 9 Some Properties of &(k) 75
For (9-16) to be valid, ?R- must exist. Any noise V(k) whose covariance Corollary 9-l. AZ1results obtained in Lessons 3,4, and 5 for 6,,(k) can
matrix % is positive definite qualifies. Of course, if V(k) is white, then $Ris be applied to 6BLu(k)by setting W(k) = K(k). Cl
diagonal (or block diagonal) and 3-l exists. This may also be true if V(k) is
We leaveit to the reader to explore the full implications of this important
not white. A second necessary condition for minimizing Ji (fi,hi) is dJi
corollary, by reexamining the wide range of topics, which were discussedin
(fiyX;)/aXj = 0 (i = 172, . 7n), which gives us the unbiasednessconstraints
Lessons3,4 and 5.
l l
Proof. Recall Equation (13) of Lesson4, that Substituting (9-32) and (9-33) into (9-29), and making repeated use of the
P(k) = [%?(k)W(k)X(k)J- (9-26) unbiasednessconstraints XF = %eFd= FOX0 = I, we find that
The proof of this result is a direct consequenceof Theorems 9-2 and 9-4. 0
F4(k)X(k) = I (9-31) At the end of Lesson 3 we noted that iwLs(Ic)may not be invariant under
thus, changes.We demonstrate next that 4&k) is invariant to such changes.
Theorem 9-5. 6BLU(k)is invariant under changes of scale.
Prc@(Mendel, 1973,pp. 15157). Assume that observers A and B are
observing a process; but, observer A reads the measurements in one set of
units and B in another. Let M be a symmetric matrix of scale factors relating A
to B (e.g., 5,280 ft/mile, 454 g/lb, etc.), and C&.(k)and Z&(k) denote the total
Because6BLu(k)= F(k)%(k) and F(k)?@) = 1, measurementvectors of A and B, respectively.Then
x BLU= FCRF (9-33) S&(k) = X,(k)B + VB(k) = Mf& (k) = M%e, (k)8 + MT/,(k) (9-37)
Best Linear Unbiased Estimation Lesson 9 Lesson 9 Problems
Let i,,,,(k) and 6BBLU(k) denote the BLUEs associatedwith observersA Recall that, in best-linear unbiased estimation, P(k) = COV[&&C)].
and B, respectively; then, Observe, in Theorem 9-7, that we compute P(k) recursively, and not P-l(k).
This is why the results in Theorem 9-7 (and, subsequently, Theorem 4-2) are
referred to as the covariance form of recursive BLUE.
PROBLEMS
9-l. (Mendel, 1973, Exercise 3-2, pg. 175). Assume X(k) is random, and that
RECURSIVE BLUES km,(k) = F(k)~(k).
(a) Show that unbiasedness of the estimate is attained when E{F(k)%e(k)} = I.
Becauseof Corollary 9-1, we obtain recursiveformulas for the BLUE of 8 by (b) At what point in the derivation of 6BLU(k)do the computations break down
because X(k) is random?
setting l/w@ + 1) = r(k + 1) in the recursive formulas for the WLSEs of 8
which are given in Lesson 4. In the caseof a vector of measurements,we set 9-2. Here we examine the situation when V(k) is colored noise and how to use a
model to compute wj* Now our linear model is
(seeTable 5-1) w-(k + 1) = R(k + 1).
z(k + 1) = X(k + lje + v(k + 1)
Theorem 9-6 (Information Form of Recursive BLUE). A recursive
where v(k) is colored noise modeled as
structurefor i&,,(k) is:
v(k + 1) = A,v(k) + k(k)
&,(k + 1) = 6&k) + KB(k + l)[t(k + 1) - h(k + 1)6,,,(k)] (9-42)
We assume that deterministic matrix A, is known and that E(k) is zero-mean
white noise with covariance a,(k). Working with the measurement difference
z*(k + 1) = z(k + 1) - A,z(k) write down the formula for i&&k) in batch
KB(k + 1) = P(k + l)h(k + l)r-(k + 1) (9-43)
form. Be sure to define all concatenatedquantities.
9-3. (Sorenson, 1980, Exercise 3-15, pg. 130). Suppose and & are unbiased
estimators of Blwith var (&j = d and var (&j = u$ Let t!&= (Y& + (1 - a!)&.
P-(k + 1) = P-(k) + h(k + l)r-(k + l)h(k + 1) (9-44) (a) Prove that 6, is unbiased.
These equations are initialized by &u(n) and P-(n) [where P(k) is (b) Assume that & and & are statistically independent, and find the
cov [6&k)], given in (9.31)] and, are used for k = n, n + 1, . . . , N - 1. These mean-squared error of &.
equations calt also be used for k = 0, 1, . . . , N - 1 as long as i&&O) and (c) What choice of cyminimizes the mean-squared error?
P-(O) are chosen using Equations (21) and (20) in Lesson 4, respectively, in 9-4. (Mendel, 1973, Exercise 3-12, pp. 176-177). A series of measurements z(k) are
which w(0) is replaced by r-(O). Cl made, where z(k) = H8 + v(k), H is an m x n constant matrix, E{v(k)} = 0, and
cov[v(k)] = 3 is a constant matrix.
(a) Using the two formulations of the recursive BLUE show that (Ho, 1963, pp.
Theorem 9-7 (Covariance Form of Recursive BLUE). Another recur-
152-154):
sive structure for 6&k) is (9-42) in which (i) P(k + 1jH = P(k)H[HP(k)H + RI-R, and
KB(k + 1) = P(k)h(k + 1) [h(k + l)P(k)h(k + 1) + r(k + l)]- (9-45) (ii) HP(k) = R[HP(k - 1)H + RI-HP(k - 1).
80 Best Linear Unbiased Estimation Lesson 9
INTRODUCTION
LIKELIHOOD DEFINED
81
Likelihood Defined 83
82 Likelihood Lesson 10
Results, R, are the outputs of an experiment. In our work on parameter Example 10-3 (Edwards, 1972, pg. 10)
estimation for the linear model Z(k) = X(k)6 + V(X-), the results are the To further illustrate the difference between P(RjH) and L(HIR), we consider the
data in S(k) and X(k). following binomial model which we assumedescribesthe occurrence of boys and girls
We let P(RIH) denote the probability of obtaining results R given hy- in a family of two children:
pothesis H according to some probability model, e.g., p[z(k)lO]. In proba-
P(R[p) = wpm(l -p) (10-2)
bility, P(R IH) is always viewed as a function of R for fixed values of H. . .
Usually the explicit dependenceof P on H is not shown. In order to under- wherep denotes the probability of a male child, uz equalsthe number of male children,
stand the differences between probability and likelihood, it is important to f equals the number of female children, and, in this example,
show the explicit dependenceof P on H.
m+f=2 (10-3)
Example 10-l
Our objective is to determine p; but, to do this we need some results. Knocking on
Random number generators are often used to generate a sequenceof random numbers neighbors doors and conducting a simple survey, we establish two data sets:
that can then be used as the input sequence to a dynamical system, or as an additive
measurement noise sequence.To run a random number generator, you must choose a RI = (1 boy and 1 girl} +rn = 1 and f = 1 (10-4)
probability model. The Gaussian model is often used; however, it is characterized by R2 = (2 boys) +rn = 2 and f = 0 (10-5)
two parameters, mean p and variance u*. In order to obtain a stream of Gaussian
random.numbers from the random number generator, you must fix p and u*. Let m In order to keep the determination of p simple for this meager collection of data, we
and a$- denote (true) values chosen for p and c*. The Gaussianprobability density shall only consider two values for p, i.e., two hypotheses,
function for the generator is ~[r (k)lp T, c&l, and the numbers we obtain at its output,
are of course quite dependent on the hypothesisHT = (m, a$). 0 H,:p = l/4
z(l), z(2), - l ?
(10-6)
Hz:p = l/2 I
For fixed H we can apply the axioms of probability (e.g., seePapoulis, To begin, we create a table of probabilities, in which the entries are P(RilHj
1965). If, for example, results RI and R2 are mutually exclusive,then P(R1 or fixed), where this is computed using (10-2). For HI (i.e., p = l/4), P(R@/4) = 3/8 and
R,IH) = P(RlIH) + P(R,IH). P(R&4) = l/16; for & (i.e., p = l/2), P(R1]1/2) = l/2 and P(R211/2)= l/4. These
results are collected together in Table 10-l.
Definition 10-l (Edwards, 1972, pg. 9). Likelihood, L(HIR), of the TABLE 10-l P(Ri 1Hjfixed)
hypothesis H given the results R and a specific probability model is proportional
Rl R2
to P(RIH), the constant of proportionality being arbitrary, i.e.,
H* 318 l/l6
L(HIR) = cP(RIH) q (10-l) HZ l/2 l/4
For likelihood R is fixed (i.e., given ahead of time) and H is variable. Next, we create a table of likelihoods, using (lo-l). In this table (Table 10-2) the
There are no axioms of likelihood. Likelihood cannot be compared using entries are L (HilRj fixed). Constants cl and c2 are arbitrary and cl, for example,
different data sets (i.e., different results, say, RI and R2) unless the data sets appears in each one of the table entries in the RI column.
are statistically independent.
TABLE 10-2 L(H,(R, fixed)
Example 10-2 Rl R2
Suppose we are given a sequenceof Gaussian random numbers, using the random Hl 3/8 cl l/16 C2
number generator that was describedin Example 10-1, but, we do not know PT and a:. H2 l/2 Cl l/4 c2
Is it possible to infer (i.e., estimate) what the values of p and u* were that most likely
generated the given sequence?The method of maximum-likelihood, which we study in
Lesson 11, will show us how to do this. The starting point for the estimation of p and C* What can we conclude from the table of likelihoods? First, for data R 1, the
will bep [z (Ic&, u*], where now z (k) is fixed and p and u2 are treated as variables. 0 likelihood of HI is 3/4 the likelihood of I&. The number 3/4 was obtained by taking the
a4 L\ kelihoocl Lessorl 10
Multiple Hypotheses 85
ratio of likehhoods L (Hl!R!) and L (H$?l). Second, on data R2, the likelihood of IfI is
RI &I R2) = 314 x l/4 = 3/l& This reinforces our intuition that p = 112is much more
l/4 the likelihood of Hz [note that l/4 = l/l6 c2/ l/4 c2]. Finally, we conclude that, even
likely thanp = l/4. El
from our two meager results, the vaIuep = l/Z appears to be more likely than the value
p = l/4? which, of course, agreeswith our intuition. q
lesson 11 (1965). and Stepner and Mehra (1973), for example], or the conditional like-
lihood function (Nahi, 1969).We shall use these terms interchangeably.
In many applicationsp (!i@3)is exponential (e.g.: Gaussian).It is easier Recognizing that the secondterm in (11-7) is a quadratic form, we can write it
then to work with the natural logarithm of @~CF,)than with &@iZ). Let in a more compact notation as l/2(9 - ~~~~)lJO(~hlL~~)(~ - && where
L(@E) = ln 1(01%) (11-4) J,(&,,&), the observed Fisher irzformation matrix [seeEquation (646)], is
Quantity L is sometimesreferred to as the log-likelihood function, the support
function (Kmenta, 1971), the likelihood function [Mehra (1971), Schweppe J~(L&)=(~)!,=g i,j=LL..,n (H-8)
I ML
* The material in this section is taken from Mendel (1983b, pp* 94-95). * The material in this section is taken from Mended(1983b, pp+95-98).
Maximum-Likelihood Estimation Lesson 11 Properties of Maximum-Likelihood Estimates
There are two unknown parameters in L, p and (T. Differentiating L with respect to
each of them gives
dL (1147a)
- A- 2 [z(i) - ~1
-&I- cr2jz,
Now let us examine sufficient conditions for the likelihood function to be
maximum. We assumethat, closeto 0, L(el%) is approximately quadratic, in dL Nl (11-17b)
-=----T+--T
a@*) 2t7 21 $ Lm - PI2
q I-1
which case
Equating these partial derivatives to zero, we obtain
(11-10)
(11-18a)
oMLi=l
From vector calculus, it is well known that a sufficient condition for ij;function
of ytvariables to be maximum is that the matrix of secondpartial derivatives of (1148b)
that function, evaluated at the extremum, must be negative definite. For
L(el%) in (ll-lo), this meansthat a sufficient condition for L(elS) to be max- For &$rLdifferent from zero (1 l-18a) reduces to
imized is
(1141) CL zi0 -: bML] = 0
Jo(i,,le> < 0 j z 1
b ,u2)= p (2(l)lr.~,u2)p(2(91~ ,u2) l l l p(z(N)Ip ,02) (11-14) Thus, the MLE of the variance of a Gaussianpopulation is simply equal to the sample
and its logarithm is variance. 0
is the Fisher lnfur-matiun Matrix [Equation (64)]. and, (3) asymptotically objectives in thisparagraph are twofold: (1) to derive the MLE of 9, iML(k),
efficient. and (2) to relate &,&I?) to &u(k) and &(k).
In order to derive the MLE of 9, we need to determine a formula for
Pr~uf. For proofs of consistency, asymptotic normality and asymptotic
p(S(k)j0). To proceed, we assume that V(k) is Gaussian, with multivariate
efficiency, see Sorenson (1980, pp. 187-190; 190-192; and 192-193, re-
density function p (T(k)), where
specGvely). Tnese proofs, though somewhat heuristic, convey the ideas
needed to prove the three parts of this theorem. More detailed analyses can be
found in Cramer (1946) and Zacks (1971). See a!so, Problems U-13, H-14>
Pwo~ = g (27i-ypIi(k)i
l exp
[ -+ V(k)W(k)V(k)] (11-24)
and 11-15. q Recall (e.g., Papoulis, 1965)that linear transformations on, and linear combi-
nations of, Gaussian random vectors are themselves Gaussian random vec-
Theorem 11-2 (Invariance-Froperty of MLEs). Lel g(8) be a vect~~r
tors. For this reason, it is clear that when V(k) is Gaussian,s(k) is aswell. The
functiun mapping 0 into an itlterval in r-dimensi~~~al Euclidean space. Let 6ML
multivariate Gaussian density function of s(k), derived from p (V(k)), is
be a ML E of 0; then g(&) is a A4L E of g(8); i.e.,
(H-21) Pcw$J~
= v(2?;)Npi(k)[
l exp
-$ EW)
B-oaf (See Zacks, 1971). Note that Zacks points out that in many
- X(k)9]%-(k)[%(k) - X(k)e]j (11-25)
books, this theorem is cited only for the case of one-to-one mappings, g(0).
His proof doesnot require g(9) to be one-to-one. Note, also, that the proof of Theorem 11-3. When p(ZlO) is multivariate Gaussian and X&j is
this theorem is related to the consistency carry-over property of a consistent deterministic, then the principle of ML leads to the BLUE of 0, i.e.,
estimator, which wasdiscussedin Lesson 7. El
kdk) = ~m# (11-26)
Example 1l-2
We wish to obtain a MLE of the variance CT?in our linear model z(k) = h(k)9 + v(k). Proof We must maximize p (3!]6) with respect to 0. This can be ac-
One approach is to let $ = D:, establish the log-likelihood function for & and max- complished by minimizing the argument of the exponential in (11-25); hence,
. . b,,(k) is the solution of
imue 11,to determine &,ML. Usual!y, mathematical programming (i.e., search tech-
niques) rnust be used to determine @1.ML. Here is where a difficulty canoccur,because
& (a variance) is known to be positive; thus, 6IVMLmust be consfrained to be positive. =0 (11-27)
ML
Unfortunately, constrained mathematical programming techniques are more difficult
than unconstrained ones. This equatio-n can also be expressed ,as dJ[&,]ldb = 0 where J[&L] =
A secondapproachis to let & = CT,,. establish the log-likelihood function for O2(it Y(k)%-(k)%(k) and Y,(k) = Z(k) - XeML(k). Comparing this version of
will be the sameas the one for & , except that 19~
will be replaced by &), and maximize it (11-27) with Equation (3-4) and the subsequentderivation of the WLSE of 9,
to determine I&,~~. Because & is a standard deviation, which can be positive or
we conclude that
negative, unconstrained mathematica1programming can be used to determine 62,ML,
Finally, we use the Invariance lroperty of MLEs to compute &,ML,as
a (U-28)
kML = (LdL)2 El (M-22)
but, we also know, from Lesson 9, that
THE LINEAR MODEL (X(k) deterministic)
(13-29)
We return now to the linear model From (11-28) and (ll-29), we conclude that
3(k) = X(k)0 + V(k) (H-23)
(U-30)
in which 9 is an tz x 1 vector of deterministic parameters, X(k) is deferministic,
and V(k) is zero mean white noise, with covariance matrix a(k). This is We now suggest a reason why Theorem 9-6 (and, subsequently, The-
precisely the samemodel that was used to derive the BLUE of 0, &#). Our orem 4-l) is referred to as the information form of recursiveBLUE. From
Maximum-Likelihood Estimation Lesson 11 A Log-Likelihood Function for an Important Dynamical System
Theorems
A 11-1 and 11-3 we know that, when X(k) is deterministic, additive white Gaussian noise. This system is described by the following
eBLU- A@; 8, l/N J-l). This meansthat P(k) = cov [&u(k)] is proportional state-equation model:
to J-l. Observe, in Theorem 9-6, that we compute P- recursively, and not P.
x(k + 1) = @x(k) + Pu(k) (11-32)
BecauseP- is proportional to the Fisher information matrix J, the results in
Theorem 9-6 (and 4-l) are therefore referred to as the information form of
recursive BLUE.
z(k + 1) = Hx(k + 1) + v(k + 1) k =O,l,... ,N - 1 (11-33)
A secondmore pragmaticreasonis due to the fact that the inverse of any
covariance matrix is known as an information matrix (e.g., Anderson and In this model, u(k) is known ahead of time, x(0) is deterministic, E{v(k)} = 0,
Moore, 1979,pg. 138).Consequently, any algorithm that is in terms of infor- v(k) is Gaussian, and E{v(k)v( j)} = R6kj.
mation matrices is known as an information form algorithm. To begin, we must establish the parameters that constitute 9. In theory 8
could contain all of the elementsin @,!P, H and R. In practice, however, these
Corollary 11-l. If p[%(k)le] is multivariate Gaussian, X(k) is deter- matrices are never completely unknown. State equations are either derived
ministic, and a(k) = a: I, then from physical principles (e.g., Newtons laws) or associatedwith a canonical
model (e.g., controllability canonical form); hence, we usually know that
&L(k) = hJ(~) = L(k) (11-31)
certain elements in @, V and H are identically zero or are known constants.
These estimators are: unbiased, most efficient (within the class of linear Even though all of the elements in 0, !P, H and R will not be unknown in
estimators),consistent,and Gaussian. an application, there still may be more unknowns present than can possibly be
identified. How many parameterscan be identified, and which parameters can
Proof. To obtain (H-31), combine the resultsin Theorems 11-3 and 9-2.
be identified by maximum-likelihood estimation (or, for that matter, by any
The estimators are:
type of estimation) is the subject of identifiability of system (Stepner and
Mehra, 1973). We shall assume that 8 is identifiable. Identifiability is akin to
1. unbiased, becausei,,(k) is unbiased;
existence. When we assume 8 is identifiable, we assume that it is possible to
2. most efficient, because&,(k) is most efficient; identify 8 by ML methods. This means that all of our statements are predi-
3. consistent, because&IC) is consistent; and cated by the statement: If 8 is identifiable, then . . . .
4. Gaussian, because they depend linearly upon Z(k) which is Gaus- We let
sian. 0
0 = co1 (elements of a, TV, H, and R) (11-34)
Observe that, when Z(k) is Gaussian, this corollary permits us to make Example 11-3
statementsabout small-sampleproperties of MLEs. Usually, we cannot make
such statements. The controllablecanonicalform state-variablerepresentation for the discrete-time
autoregressive moving average (ARMA) model
H(z) = (11-35)
A LOG-LIKELIHOOD FUNCTION
FOR AN IMPORTANT DYNAMICAL SYSTEM -
which implies the ARMA difference equation (z denotes the unit advance operator)
In practice there are two major problems in obtaining MLEs of parameters in y (k + n) + aly(k + n - 1) + l l l + any(k)
models of dynamical systems =P*u(k+n -1)+.*.+&u(k) (11-36)
is
1. obtaining an expressionfor L(elZ}, and
xl(k + 1) Xl(k) 0
2. maximizing L{6l%}with respect to 8.
In this section we direct our attention to the first problem for a linear, time-
invariant dynamical system that is excited by a known forcing function, has
x2@ .+ 1)
x,,(k:+ 1)
x2(k)
..
10
+ . u(k)
O
.
i
(11-37)
and In essence, then, stute equation (11-43) is a comtraint that is associated with the
computatim of the log-likelihood function.
How do we determine bML for L(9[%) given in (11-42) [subject to its
constraint in (ll-43)]? No simple closed-form solution is possible, because 8
enters into L($?!I) in 2 complicated nunlinear manner. The only way presently
For this model there are no unknown parameters in matrix W, and, @ and H each known for obtaining 9ML is by means of mathematical programming, which is
contain exactly n (unknown) parameters. Matrix @contains the n a-parameters which beyond the scope of this course (e.g., see Mendel, 1983, pp. 141-142).
are associatedwith the poles of H(z), whereas matrix H contains the PIp-parameters
which are associated with the zero of H(z). In genera/, an &-or&r, single-input Comment. This completes our studies of methods for estimating un-
single-output system k compkteiy charucterized by 2 n parameters0 0
known deterministic parameters. Prior to studying methods for estimating
unknown random parameters, we pause to review an important body of mate-
Our objective now is to determine L(C$E)for the system in (U-32) and
rial about muhivariate Gaussian random variables.
(H-33). To begin, we must determine p (910) = p (z(l), z(2), . . . , z(N)/@).This
is easy to do, becauseof the whiteness of noise v(k)? i.e., p (z(l), z(2), . . . ,
eN = P W~P MW l l l P won thw PROBLEMS
(11-39) 11-I. If i is the MLE of a, what is the MLE of aO?Explain your answer.
Wl~~ = h [IJPMM~]
11-2. (Sorenson, 1980, Theorem 5.1, pg. 185). If an estimator exists such that
From the Gaussian nature of v(k) and the linear measurement model in equality is satisfied in the Cramer-Rao inequality, prove that it can be
(U-33), we know that determined as the solution of the likelihood equation.
11-3. Consider a sequence of independently distributed random variables
X1,X2 , . . . ,xN , having the probability density function #xie-~, where 8 > 0.
[ - ; [z(i) - Hx(i)]R- [z(i) - Hx(i)]} (11-40)
(a) Derive &&1v).
(b) You want to study whether or not &&N) is an unbiased estimator of 8.
thus, Explain (without working out all the details) how you would do this.
11-4, Consider a random variable z which can take on the values z = 0, 1,2, . . . . This
L(IqE) = - i i [z(i) - Hx(i)]R- [z(i) - Hx(i)] variable is Poisson distributed, i.e., its probability function P(z) is P(t) =
i=l
pze p/z! Let 2 (l), z(2), . . . , z(N) denote N independent observations of t.
-$hiR/ -$mln27r (11-41) Find i&&N).
11-5. If p(z) = Be+=, z > 0 and p(z) = 0, otherwise, find I&, given a sample of N
The log-likelihood function L#!E) is a function of 0. To indicate which independent observations.
quantities on the right-hand side of (U-41) may depend on 6, we subscript all 11-6. Find the maximum-likelihood estimator of 8 from a sample of N independent
N observations that are uniformly distributed over the interval (0, 0).
such quantities with 9. Additionally, becauseT~H in 27rdoes not depend on 8
11-7. Derive the maximum-likelihood estimate of the signal power, defined as e, in
we negIect it in subsequentdiscussions.Our final log-Iikelihood function is: the signal z(k) = o(k), k = 0, 1,. . . , N, where s(k) is a scalar, stationary,
zero-mean Gaussian random sequence with autocorrelation E{s (9s (j)} =
L(@E) = -; 2 [z(i) - &x&)]Ri [z(i) - Hex&)] - $ In /Rot (11-42) 9(j - i>*
i=l 11-8. Supposex is a binary variable that assumesa value of 1 with probability a and a
value of 0 with probability (1 - a). The probability distribution of x can be
observe that 8 occurs explicitly and implicitly in L@l%). Matrices I&, and described by
R. contain the explicit dependenceof L@lS!) on 8, whereasstate vector x@(i)
contains the implicit dependence of L@i$!E)on 9. In order to numerically P(x) = (1 - a) - *a
calculate the right-hand side of (1 l-42), we must solve state equation (11-32). Supposewe draw a random sample of N values (x1,x2, . . . , x.~}.Find the MLE
This can be done when vaIuesof the unknown parameterswhich appear in @ of a.
and V are given specific values; for, then (11-32)becomes 11-9. (Mendel, 1973, Exercise 3-15, pg. 177). Consider the linear model for which
x@(O)
known (1l-43) V(k) is Gaussian with zero mean and %(A-)= (~1.
x@(k+ 1) = @G@(~)
+ %Q4
Lesson 11 Problems
98 Maximum-Likelihood Estimation Lesson 11
(a) Show that the maximum-likelihood estimator of u*, denoted &&, is p (cq0)/a2 = c;= ] a In p (z (i)!6)/#; (4) Using the strong law of large
numbers to assert that, with probability one, sample averages converge to
(f.E- 3tii,J(% - ei,,) ensemble averages, and assuming that E{B! in p (z(i>le)lde} is negative
&,, = .
N definite? show that 6ML 8 with probability one; thus, iML is a consistent
estimator of 8. The steps in this proof have been taken from Sorenson, 1980,
where 6MLis the maximum-likelihood estimator of 8. pp. 187-191.1
(b) Show that &*MLis biased, but that it is asymptotically unbiased.
11-13. Prove that, for a random sample of measurements, a maximum-
1140. We are given N independent samples z(l), z(2), . . . , z(N) of the identically likelihood estimator is asymptotically Gaussian with mean value 8 and
distributed two-dimensional random vector z(i) = co1[z*(i), z2(i)], with the covariance matrix (NJ)- where J is the Fisher information matrix for a single
Gaussian density function measurement 2 (i>. [Hints: (1) Expand d In p (%liML)/dein a Taylor series about
1 exp -
z:(i) - 2pz4i)z*(i) + z;(i) 8 and neglect second- and higher-order terms; (2) Show that
PMi), t*(i)lp> =
27dc-7 I 20 - P> I
where p is the correlation coefficient between z1and z2.
(a) Determine i)ML (Hint: You will obtain a cubic equation for jjML and must
show that bL = r, where r equals the samplecorrelation coefficient). (3) Let s(e) = dlnp ew) la and show that s(e) = xrS 1si(e), where si(e) =d
(b) Determine the Cramer-Rao bound for i)ML.
ln P (f (i)k9 I% (4) Let S denote the sample mean of Si(e), and show that the
1141. It is well known, from linear system theory, that the choice of state variables distribution of S asymptotically converges to a Gaussian distribution having
used to describe a dynamical system is not unique. Consider the n th-order mean zero and covariance J/N, (5) Using the strong law of large numbers we
difference equation
know that -+ E{ a In p (z (#I) / 8010); consequently, show
y (k + n) + a1y (k + II - 1) + l l l + any(k)
= bim(k + n - 1) + bEeIrn(k + n - 2) + l .* + bym(k). that (i,, - e)J is asymptotically Gaussian with zero mean and covariance
J/N; (6) Complete the proof of the theorem. The steps in this proof have been
A state-variable model for this system is taken from Sorenson, 1980, pg. 192.1
0 11-14. Prove that, for a random sample of measurements, a maximum-likelihood
0. estimator is asymptotically efficient [Hints: (1) Let S(e) = -E{a in p (%$I)/
. m (4 a210} and show that S(0) = NJ where J is the Fisher information matrix for a
-a* single measurement z (13;(2) Use the result stated in Problem 11-13to complete
the proof. The steps in this proof have been taken from Sorenson, 1980, pg.
and 193.]
y(k) = (1 O...O)x(k)
where
Suppose maximum-likelihood estimates have been determined for the ai- and
bi-parameters. How does one compute the maximum-likelihood estimates of
the by-parameters?
1142. Prove that, for a random sample of measurements, a maximum-likelihood
estimator is a consistent estimator [Hints: (1) Show that E{d In p (9fl8) / ae(e} =
0; (2) Expand d In p (!%&&% in a Taylor series about 8, and show that d In
p(%(6)/de = -(iML - 8)a ln ~(S@*)/LJO, where 8* = he + (1 - &)iML and
0 5 h 5 1; (3) Show that d in p(%le)/~% = cfl 1 d In p(z(i))8)/89 and 8 ln
Jointly Gaussian Random Vectors ?Ol
Let yl, ~2,. - -, y, be random variables, and y = co1 (yl, y,, . . . , ym). The
function
Elemenfs of Pcyl,*-.,YnJ
L2,=p(s)
=v&qexp +r- WPT,
(Y- my>
1 (12-2)
Mulfivar;afe Gcwssiun is said to be a multivariate (m-variate) Gaussian density function [i.e., y - N(y;
m,, P,)]. In (12-2),
Gaussian random variables are important and widely used for at least two JOtNTLY GAUSSfAN RANDOM VECTORS
reasons. First, they oflen provide a model that is a reasonableapproximation
to observed random behavior. Second, if the random phenomenon that we Let x and y individually be n- and m-dimensional Gaussianrandom vectors,
observe at the macroscopiclevel is the superposition of an arbitrarily large i.e., x - N(x; m,, Px) and y - N(y; m,, P,). Let P,, and P,, denote the cross-
number of independent random phenomena, which occur at the microscopic covariance matrices betweenx and y, i.e.,
level, the macroscopicdescription is justifiably Gaussian.
Most (if not all) of the material in this lesson should be a review for a P, = Wx - My - my)) (12-5)
reader who has had a coursein probability theory. We collect a wide range of and
facts about multivariate Gaussian random variables here, in one place, be-
causethey are often neededin the remainmg lessons. Pyx= NY - m,.>(x- m,)) (12-6)
We are interested in the joint density between x and y, p (x, y). Vectors x and y
are jointly Gaussianif
UNIVARIATE GAUSSIAN DENSITY FUNCTION
Note that if x and y are jointly Gaussian then they are marginally (i.e., Taking a closer look at the quadratic exponent, which we denote E(x,y), we
individually) Gaussian.The converseis true if x and y are independent, but it find that
is not necessarilytrue if they are not independent (Papoulis, 1965,pg. 184). E(x,y) = (x - m,)A(x - m,) + 2(x - mX)B(y - m,)
In order to evaluatep (x, y) in (12-7), we need lPZiand P,. It is straight- + (Y - mJ(C - PY>(Y- my)
forward to compute lPZlonce the given values for P,, P,, PXy,and PrXare (12-20)
= cx - m,)A(x - m,) - 2(x - m,)A P,,P;(y - m,)
substituted into (12-10).It is often useful to be able to expressthe components + (Y - q)qpyxA P,,p;yy - my)
of Pi directly in terms of the componentsof P,. It is a straightforward exercise
in algebra (just form P,P, = I and equate elements on both sides) to show In obtaining (12-20) we have used (12-13) and (12-14). We now recognizethat
that (12-20) looks like a quadratic expression in x - m, and P,,P;(y
. - m,), and
expressit in factored form, as
(12-11)
where ax, Y) = E(x - mx>
A = (PX- P,P;P,)- = P, + P;p,,CP,P; (12-12) - PxyP;(Y - m,>lN(x- mx)- PxyP,(Y - m,>l (12-21)
and Proof That E(xjy) is G aussianfollows from the linearity property ap-
plied to (12-30).An affine transformation of y has the structure Ty + f. E(x\y)
L = Px - PqP;lPfl (12-27) has this structure; thus, it is an affine transformation. Note that if m, = 0 and
From (12~23),nyesee that m, = 0, then E{xly} is a linear transformation of y. 5
a. If we can show that y and i are statistically independent, then (12-37) (a) E{xjy} = E(x) if x and y are independent
follows from Theorem 12-3. For Gaussian random vectors, however, (b) WY)XlY~ = g(YPwYl
(d E{c lyl = c
uncorrelatednessimplies independence.
(d) WY~lY~ = ET(Y)
To begin, we assert that y and 21are jointly Gaussian, because (e) E{cx + hzly} = cE{xly} + hE{z/y}
ii = z - E{zly} = z - m, - P,,P;(y - my) dependson y and z, which are (0 E{x) = E{E{xiyl) where the outer expectation is with respect to y
jointly Gaussian. (9) WYbl = WY)EblYH w here the outer expectation is with respect to y.
Xext, we show that Z is zero mean. This follows from the 12-3. Prove that the cross-covariance matrix of two uncorrelated random vectors is
calculation zero.
mi = E{z - E{zly}} = E(z) - E{E{zly}} (12-38)
where the outer expectationin the secondterm on the right-hand side of
(12-38) is with respectto y. From probability theory (Papoulis, 1965,pg.
208), E(z) can be expressedas
EM = E(E(zly)Z (12-39)
From (12-38) and (12-39),we see that rni = 0.
Finally, we show that y and Z are uncorrelated. This follows from
the calculation
E{(y- my>@
- mi))= E{(y - m&i} = E{yi}
= Wyzl - E(yE(z'jy)~ (12-40)
= E{yz} - E{yz} = 0
b. A detailed proof is given by Meditch (1969,pp. 101-102).The idea is to
(1) compute E{xly,z} in expanded form, (2) compute E{xly,i} in ex-
panded form, using Z given in (12-36), and (3) comparethe results from
(1) and (2) to prove the truth of (12-35). 0
PROBLEMS
12-1. Fill in all of the details required to prove part b of Theorem 12-4.
12-2. Let x, y and z be jointly distributed random vectors: c and h fixed constants; and
g ( 0) a scalar-valuedfunction. Assume E(x), E(z). and E(g (y)x} exist. Prove the
following useful properties of conditional expectation:
Mean-Squared Estimation
In Lesson 2 we showedthat state estimation and deconvolution can be viewed We now show that the notion of conditional expectation is to the
as problems in which we are interested in estimating a vector of random calculation of &&c). As usual, we let
parameters* For us, state estimation and deconvolution serve as the primary
95(k) = co1[z(k), z(k - l), . . . , z(l)] (13-4)
motivation for studyingmethodsfor estimating random parameters;however,
the statistics literature is filled with other apphcationsfor thesemethods. The underlying random quantities in our estimation problem are 9 and Z(k).
We now view 0 asan PI x 1 vector of random unknown parameters. The We assumethat their joint density function p[e,!E(k)] exists, so th.at
information available to us are measurementsz(I), z(2), . . . , z(k), which are
assumed to depend upon 9. In this lesson, we do not begin by assuming a
specific structural dependencybetween z(i) and 0. This is quite different than
what we did in WLSE and BLUE. Those methods were studied for the linear where d9 = d&d& . . . de,, d%(k) = dzr(1) . . . dri(k)dzz(l) . . . dz,(k) . . .
model z(i) = H(i)9 -t v(z),and, closed-form solutions for 6=(k) and &JJ(~) dz,(l) . . . dz,(k), and there are H + km integrals. Using the fact that
could not havebeenobtained had we not begun by assumingthe linear model. (13-6)
p WW)l = p C@V)$ [Wdl
We shah study the estimatiun of random 0 for the linear model in Lesson 14.
ln this lessonwe examinetwo methods for estimating a vector of random we rewrite (13-5) as
parameters. The first method is based upon minimizing the mean-squared
error between 0 and 6(k). The resulting estimator is called a mean-squared
estimator, and is denoted ~&Jc). The second method is based upon max-
imizing an unconditional likelihood function, one that not only requires
knowledge of ~(9!~8) but also of p(9). The resulting estimator is called a (13-7)
maximum a posteriori estimakw, and is denoted iMAp(
Estimation of Random Parameters: General Results Lesson 13 Mean-Squared Estimation 111
From this equation we see that minimizing the conditional expectation when 8 and Z(k) are jointly Gaussianwe have a very important and practical
E{@&)&&)~%(k)} with respectto &.&j is equivalent to our original objec- corollary to Theorem 13-l.
tive of minimizing the total expectation E{6~s(k)8Ms(k)}. Note that the inte-
Corollary 13-l. When 8 and Z(k) are jointly Gaussian, the estimator
grals on the right-hand side of (13-7) remove the dependencyof the integrand
that minimizes the mean-squared error is
on the data S(k).
In summary, we have the following mean-squared estimation problem: k&) = m6+ P,(WiOPW - m&41 (13-12)
Given the measurements z(l), z(2), . . . , z(k), determine an estimator of 8,
Proof. When 8 and Z(k) are jointly Gaussian then E{OI%(k)} can be
namely,
evaluatedusing (12-17) of Lesson 12. Doing this we obtain (13-12). 0
6&k) = @[z(i), i = 1,2,. . . , k]
such that the conditional mean-squared error Corollary 13-1gives us an explicit structure for 6&k). We seethat 6&k)
mLs(k)l = E(&&)L(k)l4~), - 7z(k)) l
(13-8) is an affine transformation of Z(k). If me = 0 and m&c) = 0, then 6&k) is a
linear transformation of Z(k).
is minimized.
In order to compute 6&k) using (13-12), we must know me and m&
Derivation of Estimator
and we must first compute P&k) and P&k). We perform these computations
in Lesson 14 for the linear model, Z(k) = X(k)0 + V(k).
The solution to the mean-squaredestimation problem is given in The- Corollary 13-2. Suppose 8 and %(k) are not necessarily jointly
orem 13-1,which is known asthe Fundamental Theorem of Estimation Theory. Gaussian, and that we know me, m%(k), Pz(k) and P&k). In this case, the
estimator that is constrained to be an affine transformation of Z(k), and that
Theorem 13-l. The estimator that minimizes the mean-squared error is
minimizes the mean-squared error is also given by (13-12).
hMS(JC)
= WI~(kj~ (13-9)
Proof. This corollary can be proved in a number of ways. A direct proof
Proof (Mendel, 1983b). In this proof we omit all functional depen- begins by assuming that 6&k) = A(k)%(k) + b(k) and choosing A(k) and
dences on k, for notational simplicity. Our approach is to substitute b(k) so that 6&k) is an unbiased estimator of 8 and E{6Eyis(k)&s(k)} = trace
i&&k) = 8 - 6,,(k) into (13-8) and to complete the square, as follows: E(&&)%&)1 is minimized. We leave the details of this direct proof to the
reader.
J1[~MS(k)I
= we - hMSjv - ~MsjIffJ A less direct proof is based upon the following Gedanken experiment.
= E{ee - efhMS- i&e + iifrlSiiMSpQ (13-10) Using known first and second moments of 8 and Z(k), we can conceptualize
= E{eep) - E{ep$iMs - i&E{ela) + &,JiMs unique Gaussian random vectors that have these same first and secondmo-
= E{efel%}+ [bMS- E{elk3f>]@Ms
- E{ele>] ments. For these statistically-equivalent (through second-order moments)
- E{8pI}E{8]~} Gaussianvectors, we know, from Corollary 13-1, that the mean-squaredesti-
mator is given by the affine transformation of S(k) in (13-12). 0
To obtain the third line we usedthe fact that 6MS,by definition, is a fUnCtiOn of
%; hence, E{&SI%} = 6MS.The first and last terms in (1340) do not dependon Corollaries 13-1 and 13-2, aswell as Theorem 13-1, provide us with the
&& hence, the smallest value of J1[6Ms(k)] is obviously attained by setting the answerto the following important question: When is the linear (affine) mean-
bracketed terms equal to zero. This means that 6,s must be chosen as in squared estimator the same as the mean-squared estimator? The answer is,
(13-9). cl when 8 and Z(k) are jointly Gaussian.If 8 and ttI(k) are not jointly Gaussian,
then 6&k) = E{OIZ(k)}, which, in general, is a nonlinear function of mea-
Let J,*[e,,(k)] denote the minimum Vahe of &[&s(k)]. we see, from surementss(k), i.e., it is a nonlinear estimator.
(1340) and (13-g), that
&@&k)] = E{eel%} - &s(k)&s(k) (1341) Corollary 13-3 (Orthogonality Principle). Suppose f [Z(k)] is any func-
tion of the data Z(k). Then the error in the mean-squared estimator is orthogo-
As it stands, (13-9) is not terribly useful for computing g,s(k). In gen- nal to f [S(k)] in the sense that
eral, we must first compute p [cl%(k)] and then perform the requisite number
of integrations of ep[el%(k)] to obtain bMS(k).In the specialbut important case E(Ee- kls(wlfEw)lI = 0 (13-13)
112 Estimation of Random Parameters: General Results Lesson 13 Mean-Squared Estimation 113
Proof (Mendel, 1983b, pp. 46-47). We use the following result from Becausevariancesare aIwayspositive the minimum value of J{&,(k)] must be
probability theory (Papoulis, 1945;see,also: Problem 12-2(g)). Let CKand @be achievedwhen each of the n variancesis minimized; hence, our mean-squared
jointly distributed random vectors and g (fi) be a scalar-valuedfunction; then estimator is equivalent to an MVE. [7
Property 2 (Minimum Variance). Dispersion about the mean value of Property 5 (Uniqueness). Mean-squaredestimator 6&k), in (13-12),is
t&&k) is measured by the error variance c$ Ms(k), where i = 1, 2, . . . , n. An unique.
estimator that has the smallest error varianceisa minimum-variance estimator
(an MVE). Th e mean-squaredestimator in (13-12)is an MVE. The proof of this property is not central to our developments; hence,it is
Proof. From Property 1 and the definition of error variance, we seethat omitted.
.Ms(k) = E{@&(k)}
c-r;8 i = 1,2,. . . , II (13-X) Generahations
Our mean-squaredestimator was obtained by minimizing .J[&&k)] in (13-2),
which can now be expressedas Many of the results presented in this section are applicable to objective
functions other than the mean-squaredobjective function in (13-2). See Me-
(13-17) ditch (1969) for discussionson a wide number of objective functions that lead
to E{elS(k)> as the optimal estimator of 8.
Maximum a Posteriori Estimation
Estimation of Random Parameters: General Results Lesson 13
j.&, which is unknown to us in this example, is transferred from the first random
MAXIMUM A POSTERIOR1 ESTIMATION number generator to the second one before we can obtain the given random sample
Recall Bat.ess
. rule (Papoulis, 1965,pg. 39): Using the facts that
conditional density function, andp (0) is the prior probability density function p(j~) = (2m~)+* exp {+w} (13-21)
for 8. Observe that p(f$!E(k)) is related to likelihood function Z(ele(k)}, be-
cause Z{C$!Z@)}ap (%(k)le). Additionally, becausep (%(k)) does not depend
on 8,
(13-22)
(13-19)
In maximum a posteriori (MAP) estimation, values of 8 are found that
maximize p (e/%(k))in (13-19); such estimates are known as MAP estimates,
and will be denoted as &&j.
If 01, 62,. . . ) & are uniformly distributed, then p (f$E(k)) ap (Zf(k)lO),
and the MAP estimator of 8 equals the ML estimator of 8. Generally, MAP (13-23)
estimatesare quite different from ML estimates. For example, the invariance
property of MLEs usually doesnot carry over to MAP estimates.One reason Taking the logarithm of (13-23) and neglecting the terms which do not depend upon p,
for this can be seenfrom (13-19).Suppose,for example, that + = g(8) and we we obtain
want to determine C$ MAPby first computing 6MAP.Becausep (0) dependson the
Jacobian matrix of g-l(+), 4 MAP# @MAP).Kashyap and Rao (1976, pg. 137) LMAP(#E(RJ)) = -i j? {[z(i) - pl*1(+3- i CL214 (13-24)
1=1
note, the two estimatesare usually asymptotically identical to one another
Setting ~LMAP/@ = 0, and solving for bMA&V), we find that
since in the large samplecasethe knowledge of the observationsswampsthat
of theprior distribution. For additional discussionson the asymptotic proper-
bMAP(R? (13-25)
ties of MAP estimators, seeZacks (1971).
Quantity p (elZ(k)) in (13-19) is sometimescalled an unconditional Zike- Next, we compare bMAP(lV)and b&v), where [see (19) from Lesson 111
Zihoodfunction, becausethe random nature of 8 has been accounted for by
p (0). Density p (%(k)le>is then called a conditional likelihood function (Nahi, l f z(i) (13-24)
bdRJ) = jQ,
1969). I= 1
where S(k) is uniquely equal to the linear (i.e., affine), unbiased mean-squared
III(~) = E@/%(k)] (13-28) estimator of 0,4(k).
(c) For random vectors x, y, z, where y and z are uncorrelated. prove that
&&k) is found by maximizingJJ(@I!@)]~or equivalentiy by minimizing the
k(x/y.z} = E{xly) + IQXIZ) - m.
argument of the exponential in (13-27). The minimum value of
[O - m(k)]LT!F(k)[6 - III(~)] is zero, and this occurswhen (d) For random vectors x, y, z, where y and z are correlated, prove that
ii{xly3z} = i{x/y.Z}
where
z = z - i{zly}
so that
&ly,z} = k{xly} + i&Ii} - m,
PROBLEMS
118
Estimation of Random Parameters: The Linear and Gaussian Model lesson 14 Best Linear Unbiased Estimation, Revisited
Theorem lb 1. Although bMSis a mean-squared estimator, so that it enjoys all of the properties
of such an estimator (e.g., unbiased?minimum-variance. etc.), bMSis not a consistent
estimator of )I. Consistency is a large sample property of an estimator: however. as N
Proof. Set Pi = 0 in (14-ll), to see that increases,the dimension of p increases. becausep is Iv X 1. Consequently, we cannot
prove consistency of kMS(recall that, in all other problems, 9 is n x 1, where n is data
P&k) = [x(k)W(k)x(k)J-l (14-H) independent; in these problems we can study consistencyof 6).
Equations (14-18) and (14-23) are not very practical for actually computing kMSr
and, therefore,
becauseboth require the inversion of an N x N matrix, and N can become quite large
(14-16) (it equals the number of measurements). We return to a more practical way for
computing gMSin Lesson 22. 0
Compare (14-16) and (9-ZZ), to concludethat &IS(~) = &(Q q
One of the most startIing aspects of Theorem 14-I is that it showsus that
I3LL-J estirnathn upplies to random parameters as we!! us tu deterministic
BEST LINEAR UNBIASED ESTIMATION, REVfSITED
parameters. We return to a reexamination of BLUE below.
What does the condition Pi1 = 0, given in Theorem 14-1, mean? Sup- In Lesson 9 we derived the BLUE of 9 for the linear model (14-l), under the
pose, for example, that the elements of 0 are uncorrelated; then, POis a following assumptions about this model:
diagonal matrix! with diagonal elements & When all of these variancesare
very large, then Pi* = 0. A large variancefor 8,meanswe have no idea where
1. 8 is a deterministic but unknown vector of parameters,
19~
is located about its mean value.
2. X(k) is deterministic, and
Example 14-l [Minimum-Vkance Deconvolution) 3. V(k) is zero-mean noise with covariancematrix a(k).
In Example 2-6 we showed that, for the application of deconvolution, our linear modei
is We assumedthat 6&k) = F(k)Z(k) and chose FBLU(k)so that iBLv(k) is an
qlv) = X(N - l)p + qly) (14-17) unbiasedestimator of 9, and the error variance for each one of the n elements
We shall assumethat p and V(N) are jointly Gaussian, and, that rn@= 0 and rn=+-
= 0; of 9 is minimized. The reader should return to the derivation of 6&k) to see
hence, rn%= 0, Additionally, we assumethat cove] = pI. From (l4-7), we deter- that the assumption 0 is deterministic is never needed, either m the deri-
mine the following formula for bMS(N), vation of the unbiasednessconstraint [seethe proof of Theorem 6-1, in which
Equation (6-9) becomes [I - F(k)X(k)]E(9) = 0, if 8 is random], or in the
~&v) = FJe(N - l)[X(N - l)PJe(N - 1) + p1]-?qN) (14-18)
derivation of Ji[fj, hi) in Equation (9-14) (due to some remarkable cancel-
Recall, from Example 2-6, that when g(k) is described by the product model lations); thus, 9,Lu(k), given in (g-22), is applicable to random us well us
A4 = q W W, then deterministic parameters in our linear model (14-I); an<, because the BLUE of
P = QG (14-19) 9 is the special case of the WISE when W(k) = 91-(k), B-(k), given in (3-lo),
is also applicable to random as well as deterministic parameters in our linear
where
model.
(14-20) Theorem 14-I relates bMS(k) and i,,,(k) under some very stringent con-
and ditions that are needed in order to remove the dependence of iMs on the a
r = col(r(l), r(2), . . . , r(N)) (14-21) priori statistical information about 9 (i.e.. m, and PO),becausethis information
was never used in the derivation of &&k).
In the product model, r(k) is white Gaussian noise with variance OF, and q (k) is a
Next, we derive a different BLUE of 9, one that incorporates the a priori
Bernoulli sequence. 0bviously, if we know Qq then p is Gaussian, in which case
statistical information about 0. To do this (Sorenson, 1980, pg. 210), we treat
m as an additional measurement which will be augmented to Z(k). Our
where we have used the fact that Qi = Qq, becauseq(k) = 0 or 1, When known, additional measurement equation is obtained by adding and subtracting 8 in
(14-18) becomes the identity me = Q, i.e.,
m, = 9 + (me - 6) (14-24)
122 Maximum a Posteriori Estimator 123
Estimation of Random Parameters: The Linear and Gaussian Model Lesson 14
Quantity me - 6 is now treated as zero-meannoise with covariancematrix PO. To conclude this section, we note that the weighted-least squares objec-
Our augmentedlinear model is tive function that is associatedwith 6,,(k) is
J,[&(k)] = %;(k)%;l(k)%,(k)
fw) = (Ep)() + (lw) (14-25) (14-35)
= (me - O)P;(me - 6) + %(k)%-(k)%(k)
\ / L, L J
The first term in (14-35) contains all the a priori information about 8. Quantity
%a xc7 Yl me - 8 is treated as the difference between measurement me and its noise-
which can be written, as free model, 6.
f&(k) = K(k)6 + X(k) (14-26)
where
3,(k),%e,
(k),and V0(k) are defined in (14-25).Additionally, MAXIMUM A POSTERIOR1 ESTIMATOR
(14-27) In order to determine 6MAP(k) for the linear model in (14-1) we first need to
E&z(k)=C(k)} 2 R(k) = (,%)
determinep @l%(k)). Using the facts that 8 - N(0; me, PO)andV(k) - N(V(k);
0, a(k)), it follows that
We now treat (14-26) as the starting point for derivation of a BLUE of 8,
which we denote 6:&k). Obviously, 1 1
P 60 = vm exp { - 2 (0 - m&W(e - me)I (14-36)
6&,(k) = [%e,(k)~,(k)~,(k)]-~,,(k)~,(k)~,(k) (14-28)
Put another way, for the linear Gaussian model. all roads lead to the same
and
estimator.
Of course, the fact that &&k) = &&c) should not come as any sur- P, = E{xx/q} (14-51)
prise, becausewe already established it (in a model-free environment) in then
Theorem 13-2.
Example 14-2 (Maxirrtum-Likelihood Deconvolution] p(r.CIC(N)jq) = (23--KjP,)-*/2 exp (- i xP;x) (13-52)
As in Example 14-1,we begin with the deconvolution linear model
We leave, as an exercise for the reader, the maximization of p (r,qlCX(N)), from
Z(N) = ?e(lv - l)p + V(N) (14-42) which it follows (Mended, 1983b, pp. 112-114) that
Nuw, however, we use the product model for F, given in (l&19), to express 5(N) as
r*(Nlq) = u?Q,%e(N - l)[c&(N - l)Q,X(N - 1) + pII-%(N) (14-53)
fw) = 3e(N - 1)&r + %yN) (14-43)
&,P can be found by maximizing
For notational convenience,let
q=col(q(l),q(2),...,q(~) (14-44)
Our objectives in this example are to obtain MAP estimators for both q and r. In the (14-54)
literature on maximum-likelihood deconvolution (e.g., Mend& 1983) these estimators
are referred to as unconditional ML estimators, and are denoted i and G* We denote where
these estimators as eMApand 4MAP,in order to be consistent with this books notation.
The starting point for determining GMApand &p is the joint density function Pr(q) = fi Pr[q (k)] = Amq(l - A)N - mq (14-55)
Pk @W% th-e k-1
(14-45)
but
(14%)
P @dll = P (Mwl) ( 14-46)
Equation (14-46) usesa probability function for q rather than a probability density and finally,
function, becauseq (Ic) takes on only two discrete values, 0 and 1. Substituting (14-46) i,,, = r*(NIhM*P) (14-57)
into (14-45), we find that
(14-47) Equations (14-54) and (14-57) are quite interesting results. Observe that we are
Note, also, that permitted first to direct our attention to finding iWAPand then to finding iMAP. Ob-
n
serve, also, that rM,+p= j&&V]QJ [compare (14-53) and (14-23)].
There is no simple solution for determining GMAP.Because the elements of q in
(14-48) p(r*,q[%(N)) have nonlinear interactions, and because the elements of q are con-
strained to take on binary values, it is necessary to evaluate p(r*,qlS(N)) for every
where we have used the fact that r and q are statistically independent* Substituting possible q sequenceto find the q for which p (r * ,ql%(N)) is a global maximum. Because
(14-48) for p (Z(N)~r,qJ into (14-47), we see that q(k) can take on one of two possible values, there are 2Npossible sequenceswhere N is
(14-49) the number of elements in q. For reasonable values of N (such as N = 400), finding the
P wGw9~ =P wwwwl~
global maximum of p (r * ,qj%(N)) would require several centuries of computer time.
Observe that the r dependence of the MAP-likelihood function is completely con- We can always design a method for detecting significant values of p(k) so that
tained in p(r,%(N)iq). Additionally, when q is given, the only remaining random the resulting 4 will be nearly as likely as the unconditional maximum-likelihood esti-
quantities are the zero-mean Gaussian quantities r and Z(w; hence, p(r,C!Z(~iq) is mate, &,P. Two MAP detectors for accomplishing this are described in MendeI
muitivariate Gaussian.Letting (1983b, pp. 127-137). [Note: The reader who is interested in more details about these
MAP detectors should first review Chapter 5 in Mendel, 1983b, because they are
x = co1(r,%(N)) (14-50)
designed using a slightly different likelihood function than the one in (14-54).] III
Lesson 14 Problems 127
Estimation of Random Parameters: The Linear and Gaussian Model Lesson 14
14-3. x and v are independent Gaussian random variables with zero means and
Example 14-3 (State-Estimation)
variances O$ and d, respectively. We observe the single measurement
In Example 2-4 we that, for the application of state estimation, our linear 2 =x+v=l.
model is (a) Find &.
(b) Find iMAP.
ffV9 = X(N, kl)x(kl) + V(N, k,) (14-58)
14-4. For the linear Gaussian model in which X(k) is deterministic, prove that iMAP
From (2-17), we see that is a most efficient estimator of 8. Do this in two different ways. Is &,&) a most
efficient estimator of 6?
x(k,) = Cpklx(0) + $ @ - i yu(i - 1) = @lx(O) + Lu (14-59)
I= 1
(14-63)
and
In order to evaluate &(klIw we must first compute mxckI1and PxckI) We show how to
do this in Lesson 15.
Formula (14-63) is very cumbersome. It appears that its right-hand side changes
as a function of kl (and N). We conjecture, however, that it ought to be possible to express
&(k$V) as an affine transformation of &(kl - l(N), because x(kl) is an affine
transformation of x(kl - l), i.e., x(kl) = cPx(kl - 1) + yu(kl - 1).
Because of the importance of state estimation in many different fields (e.g.,
control theory, communication theory, signal processing, etc.) we shall examine it in
great detail in many of our succeedinglessons. 0
PROBLEMS
of DiscreteJiime
in which
GaussmMarkov my(l) = EiW)} (15-3)
DEFlNlTH3NS AND PROPERTIES UF DISCRETE-TIME Note that, in (U-5), s(tm) 5 S(t,,,) means si(tm) I Si(tm) for i =
GAUSS-MARKOV RANDOM PROCESSES 1, 2,. . . , n. If we view time point t,,, as the present time and time points
t . . . . tl as the past, then a Markov process is one whose probability law
Recall that a random processis a coIIection of random variables in which the Git, probability density function) dependsonly on the immediate past value,
notion of time plays a role. t,,,- 1. This is often referred to as the Markuv property for a vector random
process. Because the probability law depends only on the immediate past
Definition 15-l (Meditch, 1969,pg. 106). A vector random prucess is a value we often refer to such a process as a first-order Markov process (if it
family of randurn vecturs (s(t), ~4) indexed by a parameter t aU uf whuse values depended on the immediate two past values it would be a second-orderMar-
lie in some appropriate index set 9. When 9 = {k: k = 0, I, . . . ] we have a kov process).
discrete4me randurn process. El
Theorem 15-l. Let (s(t), tE.%}be a first-order Markav process, and
Definition U-2 (Meditch, 1969, pg. 117). A vector randurn prucess tl < t* < , . . < t, be any time points in 9, where m is an integer. Then
{s(t), k%} Ls defined tu be multivariate Gaasian if, fur any t time points
t1, t2, . t[ in $ where t zk an integer, [he set uf e randum n vecturs s@J,
. l ,
128
A Basic State-Variable Model 131
130 Elements of Discrete-Time Gauss-Markov Random Processes Lesson 15
... P b(t),
Equating (15-14) and (15-15), we obtain (15-13). 0
Equation (15-7) is obtained by successivelysubstituting each one of the equa-
tions in (15-9) into (15-8). 0 Theorem 15-3 means that past and future values of s(t) in no way help
determine present values of s(t). For Gaussianwhite processes,the transition
Theorem 15-l demonstrates that a first-order Markov process is com- probability density function equals the marginal density function, p [s(t)],
pletely characterizedby two probability density functions, namely, the transi- which is multivariate Gaussian. Additionally,
tion probability density function, p [s(ti)ls(ti _ JJ, and the initial (prior) proba- E{s(t,)&l - I), l l l ? s(h)> = WL 1) (15-16)
bility density function p[s(t,)]. Note that generally the transition probability
density functions can all be different, in which casethey should be subscripted
[e.g.,pmb(tm)ls(L
- 111
andpm- I[& - I>I& - JII A BASIC STATE-VARIABLE MODEL
Theorem 15-2. For a first-order Markov process, In succeedinglessonswe shall develop a variety of state estimators for the
following basic linear, (possibly) time-varying, discrete-time dynamical sys-
Ew?l >Is(tm - I), l l l 9 s(4>) = ~~s(t,)Is(L - I>> 0
(15-10) tem (our basic state-variable modeo, which is characterized by YI x 1 state
vector x(k) and m x 1 measurement vector z(k):
We leavethe proof of this useful result as an exercise.
A vector random processthat is both Gaussianand a first-order Markov
x(k + 1) = @(k + l,k)x(k)
processwill be referred to in the sequel as a Gauss-Markovprocess.
+ I-(k + l?k)w(k) + If(k + l,k)u(k) (15-17)
Definition 15-4. A vector random process {s(t), te.9) is said to be a and
Gaussian white process if, for any m time points tl, t2, . . . , t, in 9, where m is
any integer, the m random vectors s(tr), s(t& . . . , s(t,,,)are uncorrelated Gaus- z(k + 1) = H(k + l)x(k + 1) + v(k + 1) (15-M)
sian random vectors. 0
White noise is zero mean, or else it cannot have a flat spectrum. For where k = 0, 1, . . . . In this model w(k) and v(k) are p x 1 and m x 1 mutu-
. ally uncorrelated (possibly nonstationary) jointly Gaussian white noise se-
white noise
quences;i.e.,
E{s(ti)s(t,)) = 0 for all i # j (15-11)
Additionally, for Gaussianwhite noise /Elrou.(j))=a(ns,
Covariancematrix Q(i) is positive semidefinite and R(i) is positive definite [so Theorem 15-4. When x(O) and w(k) are jointly Gaussian then (x(k),
that R-*(i) exists]. Additionally, u(k) is an 2 x 1 vector of known system k= 0, l,...) is a Gauss-Markov sequence.
inputs, and initial state vector x(O) is multivariate Gaussian,with mean m&l)
and covariancePK(0),i.e., Note that if x(O) and w(k) are individually Gaussian and statistically
independent (or uncorrelated), then they will be jointly Gaussian (Papoulis,
1965).
(15-22)
Proof
a. Gaussian Property [assuming u(k) nonrandom]. Because u(k) is non-
and9x(O) is not correlated with w(k) and v(k). The dimensionsof matrices @, random, it has no effect on determining whether x(k) is Gaussian;
I, 9, H, Q and R are n X n, n x p, fl x 2, m X n, p x p, and m x m, hence, for this part of the proof we assumeu(k) = 0. The solution to
respectively. (15-17) is
Disturbance w(k) is often used to model the following types of uncer-
tainty: x(k) = @(k,O)x(O) + i @(k,i)r(i,i - l)w(i - 1) (15-23)
i=l
1. disturbance forces acting on the system (e,g., wind that buffets an air- where
plane); Q&i) = @(k,k - l)@(k - 1,k - 2). . . @,(i + 1,i) (15-24)
2. errors in modeling the system (e.g., neglectedeffects); and
Observe that x(k) is a Iinear transformation of jointly Gaussianrandom
3. errors, due to actuators,in the translation of the known input, u(k), into
vectors x(O), w(O),w(l), . . . , w(k - 1); hence, x(k) is Gaussian,
physical signals.
b. Markov Property. This property does not require x(k) or w(k) to be
Gaussian. Because x satisfies state equation (1517), we see that x(k)
Vector v(k) is often usedto model the following types of uncertainty: depends only on its immediate past value; hence, x(k) is Markov. 0
1. errors in measurementsmade by sensinginstruments; We havebeen able to show that our dynamical systemis Markov because
2. unavoidabledisturbancesthat act directly on the sensors;and we specified a model for it. Without such a specification, it would be quite
3. errors in the realization of feedback compensatorsusing physical com- difficult (or impossible) to test for the Markov nature of a random process.
ponents [this is valid only when the measurementequation contains a By stacking up x(l), x(2), , . . into a supervectorit is easily seen that this
direct throughput of the input u(k), i.e., when z(k + 1) = I-I(k + 1) supervector is just a linear transformation of jointly Gaussianquantities x(O),
x(k + 1) + G(k + l)u(k + 1) + v(k + 1); we shall examine this situ- w(O), w(l), - - - ; hence, x(l), x(2), . . . are themselvesjointly Gaussian.
ation in Lesson231. A Gauss-Markov sequencecan be completeIy characterizedin two ways:
1. specify the marginal density of the initial state vector, P Ex(O)l and the
0f course, not all dynamical systemsare describedby this basic model. transition densityP[x(k + 1)1x(k)], or
In general, w(k) and v(k) may be correlated, some measurements may be
2. specify the mean and covariance of the state vector sequence.The sec-
made so accurate that, for all practical purposes, they are perfect (Le.,
there is no measurementnoise associatedwith them), and either u(k) or v(k), ond characterization is a complete one becauseGaussian random vec-
or both, may be colored noiseprocesses.We shall considerthe modification of tors are completely characterized by their meansand covariances (Les-
our basicstate-variablemodel for each of theseimportant situations in Lesson son 12). We shall find the second characterization more useful than the
23.
first.
Elements of Discrete-Time Gauss-Markov Random Processes Lesson 15 Properties of the Basic State-Variable Model
The Gaussiandensity function for state vector x(k) is mx(k)E(w(k)l = 0, and E{w(k)m#)} = 0. State vector x(k) dependsat
most on random input w(k - 1) [see(B-17)]; hence,
p[x(k)] = [(2n)lP,(k)j]-* exp
E{x(k)w(k)} = E{x(k)}E{w(k)} = 0 (15-32)
- m,(k>lK(k)[x(k) - mxWI} (15-25)
and E{w(k)x(k)} = 0 as well. The last two terms in (15-31) are therefore
where equal to zero, and the equation reducesto (15-29).
m,(k)= EW)l (15-26) c. We leave the proof of (H-30) as an exercise.Observe that once we know
covariance matrix P,(k) it is an easy matter to determine any cross-
covariancematrix between statex(k) andx(i)@ # k). The Markov nature
P,(k)= Eew - mxwmw - mxvw (15-27) of our basic state-variable model is responsiblefor this. Cl
We now demonstrate that m,(k) and P,(k) can be computed by means of
recursive equations. Observe that mean vector m,(k) satisfies a deterministic vector state
equation, (U-28), covariance matrix P,(k) satisfiesa deterministic matrix state
equation, (15-29), and (15-28) and (15-29) are easily programmed for digital
Theorem 15-5. For our basic state-variable model,
computation.
a. m,(k) can be computed from the vector recursive equation Next we direct our attention to the statisticsof measurementvector z(k).
m,(k + 1) = @(k + l,k)m,(k) + 3!(k + l,k)u(k) (15-28)
Theorem 15-6. For our basic state-variable model, when x(O), w(k) and
where k = 0, 1, . . . , and mx(0) initializes (O-28), v(k) are jointly Gaussian, then {z(k), k = 1, 2, . . . } is Gaussian, and
b. P,(k) can be computed from the matrix recursive equation
m,(k + 1) = H(k + l)m,(k + 1) (15-33)
P,(k + 1) = iP(k + l,k)P,(k)@(k + 1,k)
+ r(k + l,k)Q(k)r(k + 1,k) (15-29)
P,(k + 1) = H(k + l)P,(k + l)H(k + 1) + R(k + 1) (15-34)
wherek=O,l,..., and P,(O)initializes (U-29), and
c. E{ [x(i) - mJi)][x(j> - m,(j)]} 4 P,(i, j) can be computed from where m,(k + 1) and P,(k + 1) are computed from (15-28) and (1529),
respectively. Cl
W, j)Px(j) wheni > j
pX(iyi> = Px(i)Qr(j, i) (B-30)
wheni <j We leave the proof as an exercisefor the reader. Note that if x(O), w(k)
Proof and v(k) are statistically independent and Gaussian they will be jointly Gaus-
a. Take the expected value of both sides of (15.17), using the facts that sian.
expectation is a linear operation (Papoulis, 1965)and w(k) is zero mean, Example 15-l
to obtain (S-28).
Consider the simple single-input single-output first-order system
b. For notational simplicity, we omit the temporal argumentsof @and r in
this part of the proof. Using (15-17)and (15-28), we obtain x(k + 1) = $x(k) + w(k) (15-35)
P,(k + 1) = E{ [x(k + 1) - mx(k + l)][x(k + 1) - m,(k + l)]}
z(k + 1) =x(k + 1) + v(k + 1) (15-36)
= E{ ww) - mx(k>l+ w41
PCW - mx(k)l+ rww1 (15-31) where w(k) and v (k) are wide-sense stationary white noise processes,for which q = 20
and r = 5. Additionally, m, (0) = 4 and p,(O) = 10.
= @P,(k)@ + IQ(k)I + QE{ [x(k) - mx(k)]wf(k)}rr The mean of x(k) is computed from the following homogeneous equation
+ FE{w(k)[x(k) - mX(k)]}@
1
Because m,(k) is not random and w(k) is zero mean, E{m,(k)w(k)) m,(k + 1) = ?mx(k) ??z,(O)= 4 (15-37)
\ -= A
Signal-To-Noise Ratio 137
136 Elements of Discrete-Tme Gauss-Markov Random Processes Lesson 15
6-
SNR(k) = -d(k) (15-45)
t
From preceding analyses,we seethat
SNR(k) = F
Figure E-1 Mean (dashed)and standard deviation(bars)for first-ordersystem SNR(k) = [h y h] (q /r) (15-48)
(15-35) and (B-36).
Elements of Discrete-Time Gauss-Markov Random Processes Lesson 15 Lesson 15 Problems 139
Scaled covariance matrix P,(k)/q is computed from the following version of 15-5.
(15-29)
(15-49)
I v(k)
One of the most useful ways for using (15-48) is to compute q /I- for a
given signal-to-noise ratio SNR, i.e.,
-4 =- SNR (15-50) In this problem, assume that u(k) and v(k) are individually Gaussian and
r h E uncorrelated. Impulse responseh depends on parameter a, where a is a Gaussian
Th random variable that is statistically independent of u (k) and v(k).
(a) Evaluate E{z (k)}.
In Lesson 18 we show that q /r can be viewed as an estimator tuning param- (b) Explain whether or not ,7(k) is Gaussian.
eter; hence, signal-to-noiseratio, SNR, can also be treated as such a param-
eter.
Observe that, if h2y = 1 - &2,then SNR = q /r. The condition h2y = 1 - 42 is satis-
fiedif,forexample,y= l,+= l/fiandh = l/G. 0
PROBLEMS
15-l. Prove Theorem 15-2, and then show that for Gaussian white noise
Eb(t,)[s(t, - I), l . l , s(h)} = E{s(t,)}.
15-2. Derive the formula for the cross-covarianceof x(k), P,(i, j), given in (15-30).
15-3. Derive the first- and second-orderstatistics of measurementvector z(k), that are
summarized in Theorem 15-6.
15-4. Reconsider the basicstate-variable model when x(0) is correlated with w(O), and
w(k) and v(k) are correlated [E{w(k)v(k)} = S(k)].
(a) Show that the covarianceequation for z(k) remains unchanged.
(b) Show that the covarianceequation for x(k) is changed,but only at k = 1.
(c) Compute E{z(k + l)z(k)}.
16
Single-Stage Predictor 141
Lesson with the linear expectation operator E{ =I?X(k- 1)). Doing this, we find that
where k =l, 2,.... To obtain (16-4) we have used the facts that
Predktion E{w(k - 1)) = 0 and u(k - 1) is deterministic.
Observe, from (l&4), that the single-stage predicted estimate,
i(klk - l), dependson the filtered estimate, i(k - Ilk - l), of the preceding
state vector x(k - 1). At this point, (16-4) is an interesting theoretical result;
but, there is nothing much we can do with it, becausewe do not as yet know
how to compute filtered state estimates. In Lesson17 we shall begin our study
into filtered state estimates, and shall learn that suchestimates of x(k) depend
on predicted estimatesof x(k), just as predicted estimates of x(k) depend on
filtered estimates of x(k - 1); thus, filtered and predicted state estimates are
very tightly coupled together.
Let P(k jk - 1) denote the error-covariance matrix that is associated
with i(k(k - l), i.e.,
INTRODUCTION
P(klk - I) = E{ [%(klk - 1)
We have mentioned, a number of times in this book, that in state estimation
- m;(kjk - l)][S(klk - 1) - m,(k [k - I)]) (16-5)
three situations are possibledepending upon the relative relationship of total
number of available measurements, IV, and the time point, k, at which we where
estimate state vector x(k), namely: prediction (N -Ck), filtering (IV = k), and
smoothing (ZV> k). In this lesson we develop algorithms for mean-squared x(k Ik - 1) = x(k) - ri(kik - 1) (16-6)
predicted estimates,&(kij), of state x(k). In order to simplify our notation,
we shall abbreviate&(k 1~)asi(k ij). (Just in caseyou haveforgotten what the Additionally, let P(k - Ilk - 1) denote the error-covariance matrix that is
notation S(k 1~)standsfor, seeLesson 2.) Note that, in prediction, k > j. associatedwith i(k - Ilk - I), i.e.,
A straightforward calculation leads to the following formula for P(klk - l), Before proving this theorem, let us observethat the prediction formula
(16-14) is intuitively what one would expect. Why is this so? Supposewe have
P(kJk - .l) = a?(k,k - I)P(k - ilk - l)W(k,k - 1) processedall of the measurementsz(l), z(2), . . . , z(j) to obtain i(jlj) and are
(16-H) asked to predict the value of x(k), where k > j. No additional measurements
+ r(k,k - l)Q(k - l)T(k,k - 1)
can be used during prediction. All that we can therefore use is our dynamical
state equation. When that equation is used for purposes of prediction we
where k = 1,2, . . . . neglect the random disturbance term, becausethe disturbancesare not mea-
Observe, from (16-4) and (16-ll), that x(010)and P(O(0)initialize the surabie. We can only use measured quantities to assistour prediction efforts.
single-stagepredictor and its error-covariance. Additionally, The simpiified state equation is
ri(O[O)= E{x(O)(no measurements}= m,(O) (16-12) x(k + 1) = @(k + l&)x(k) + !I!& + l&u(k) (16-16)
and a solution of which is
P(O10) = E(x(O~O)~(O~O)}
x(k) = @(k,j)x( j) + c @(k,i)!I!(i,i - l)u(i - 1) (16-17)
= W&(O)- mx(~)l[x(~>
- m,(W>= P,(O) (16-13) i=j+ 1
Finally, recall (Property 4 of Lesson 13) that both x(k jk - 1) and Substituting i(jl j) for x(j), we obtain the predictor in (16-14).In our proof of
X(kjk - 1) are Gaussian. Theorem 16-1we establish (16-14) in a more rigorous manner.
Proof
a. The solution to state equation (16-3), for x(k), can be expressedin terms
A GENERAL STATE PREDICTOR of x(j), where j < k, as
In this section we generalizethe results of the precedingsectionso as to obtain x(k) = @(k,j)x(j) + $ @(k,i)[I(i,i - l)w(i - 1)
predicted valuesof x(k) that look further into the future than just one step. We i=j+ 1
shall determine r2(klj) where k > j under the assumptionthat filtered state + V(i,i - l)u(i - l)] (16-M)
estimate x(jlj) and its error-covariance matrix E{ii( jlj)i( jl j)} = P( jl j) are We apply the Fundamental Theorem of Estimation Theory to (16-18) by
known for somej = 0, 1, . . . . taking the conditional expectation with respectto Z(j) on both sides of
it. Doing this, we find that
Theorem 164
a. If input u(k) is deterministic, or does not depend on any measurements, i(kl j) = @(k,j)i(jlj) + i Q(k,i)[I(i,i - l)E{w(i - 1)1%(j)}
i=j+ 1
then the mean-squared predicted estimator of x(k), i(k( j), is given by the
+ *(i,i - l)E{u(i - l)lZ( j)}] (16-19)
expression
Note that j) depends at most on x(j) which, in turn, depends at
Wi) = @OWi(jli), on w(j - Consequently,
+ 2 @(k,i)*(i,i - l)u(i - 1) k>j (16-14) E{w(i - lW(i)l =
i=j+1
E{w(i - l)lw(O), (16-20)w(l), . . l $ w(j - 01
%(k + lik) = H(k + l)ji(k + Ilk) + v(k + 1) (16-32) rse of Pi;(k + l]k) is needed; hence, we shall
llk)H(k + 1) + R(k + 1) is nonsingular. This is
b. The innovations is a zero-mean Gaussian white noise sequence, with usually true and will always be true if, as in our basic state-variable model,
E{%(k + llk)Z(k + Ilk)} = PC(k + ljk) R(k + 1) is positive definite.
= H(k + l)P(k + llk)H(k + 1) + R(k + 1) (16-33)
Proof (Mendel, 1983b) PROBLEMS
a. Substitute (16-28) into (16-29) in order to obtain (16-31). Next, substi-
16-l. Develop the counterpart to Theorem 16-1for the casewhen input u(k) is random
tute the measurement equation z(k + 1) = H(k + l)x(k + 1) +
and independent of Z( 1). What happens if u(k) is random and dependent upon
v(k + 1) into (16.31), and use the fact that ji(k + Ilk) = x(k + 1) -
i(k + l(k), to obtain (16-32). w=P
16-2. For the innovations process i(k + Ilk), prove that E{Z(i + lli)Z(j + 11~))= 0
b. Becauseg(k + Ilk) and v(k + 1) are both zero mean, E{Z(k + Ilk)} = 0. when i < j.
The innovations is Gaussianbecausez(k + 1) and S(k + l/k) are Gaus- 16-3. In the proof of part (b) of Theorem 16-2 we make repeated use of the
sian, and, therefore, 2(k + l(k) is a linear transformation of Gaussian orthogonality principle, stated in Corollary 13-3. In the latter corollary f[S(k)]
148 State Estimation: Prediction Lesson 16
INTRODUCTION
In this lesson we shall develop the Kalman filter, which is a recursive mean-
squared error filter for computing fi(k + Ilk + l), k = 0, 1, 2,. . . . As its
name implies, this filter was deveioped by KaIman [circa 1959 (Kalman,
1960)].
From the Fundamental Theorem of Estimation Theory, Theorem 13-1,
we know that
r2(k + Ilk + 1) = E{x(k + l)lZ(k + 1)) (174)
Our approach to developing the Kalman filter is to partition E(k + 3) into two
setsof measurements,3!(k) and z(k + 11,and to then expand the conditional
expectation in terms of data setsS(k) and z(k + l), i.e.,
%(k + l(k + 1) = E(x(k + 1)1%(k),z(k + 1)) (17-2)
What complicates this expansion is the fact that Z(k) and z(k + 1) are
statistically dependent. Measurement vector SE(k) depends on state vectors
x(l), x(2), - * - 9 x(k), becausez(j) = H(j)x(j) + v(j)(j = 1, 2, . . . , k). Mea-
surement vector z(k + 1) also depends on state vector x(k), because
z(k + 1) = H(k + l)x(k + 1) + v(k + 1) and x(k + 1) = @(k + l,k)x(k) +
r(k t l,k)w(k) + YT(k + l,k)u(k), Hence S(k) and z(k + 1) both dependon
x(k) and are, therefore, dependent.
State Estimation: Filtering (the Kaiman Filter) Lesson 17 The Kaiman Filter
150
Recall that x(k + l), Z(k) and z(k + 1) are jointly Gaussian random dieted value, i(k + Ilk). The following result provides us with the means for
vectors; hence, we can useTheorem 12-4to express(17-2) as evaluating %(k + l[k + 1) in terms of its error-covariance matrix
P(k + Ilk + 1).
i(k + Ilk + 1) = E{x(k + l)la(k),i) (17-3)
Preliminary Result. Filtering error-covariance matrix P(k + Ilk + 1)
where for the arbitrary linear recursive filter (17-8) is computed from the following
i = z(k + 1) - E-w
+ mw)l (17-4) equation:
We immediately recognize Z as the innovations process Z(k + Ilk) [see P(k + Ilk + 1) = [I - K(k + l)H(k + l)]P(k + llk)[I - K(k + l)H(k + l)]
(X-29)]; thus, we rewrite (17-3) as + K(k + l)R(k + l)K(k + 1) (17-9)
i(k + Ilk + 1) = E{x(k + l)lfE(k),i(k + Ilk)} (17-5) Proof. Substitute (16-32) into (17-8) and then subtract the resulting
.ation from x(k + 1) in order to obtain
Applying (12-37) to (17-5),we find that
Z(k + lik + 1) = [I - K(k + l)H(k + l)]%(k + l(k)
i(k + ilk + 1) = E{x(k + 1)1%(k)}
- K(k + l)v(k + 1) (1740)
+ E{x(k + l)IZ(k + Ilk)} - m,(k + 1) (17-6)
Substitute this equation into P(k + lfk + 1) = E{fs(k + Ilk + l)X(k + 11
We recognizethe first term on the right-hand side of (17-6) as the single-stage
k + 1)) to obtain equation (17-9). As in the proof of Theorem 16-2, we have
predicted estimator of x(k + l), r2(k + Ilk); hence,
used the fact that %(k + Ilk) and v(k + 1) are independent to show that
%(k + l]k + 1) = ?(k + l/k) E{g(k + llk)v(k + 1)) = 0. Cl
+ E{x(k + l)(Z(k + l(k)} - m,(k + 1) (17-7)
The state prediction-error covariance matrix P(k + Ilk) is given by
This equation is the starting point for our derivation of the Kalman filter. equation (1641). Observe that (17-9) and (16-11) can be computed recur-
Before proceeding further, we observe, upon comparisonof (17-2) and sively, once gain matrix K(k + 1) is specified, as follows: P(OI0)+
(17-5), that our original conditioning on z(k + 1) has beenreplaced by condi- P(l)O)+P(ljl)+ P(2ll)+P(212)-, . etc.
l l
tioning on the innovationsprocessi(k + Ilk). One can showthat Z(k + Ilk) is It is important to reiterate the fact that (17-9) is true for any gain matrix,
computable from z(k + l), and that z(k + 1) is computablefrom ?(k + Ilk); including the optimal gain matrix given next in Theorem 17-l.
hence, it issaid that z(k + 1) and Z(k + Ilk) are causally invertible (Anderson I+
and Moore, 1979).We explain this statement more carefully at the end of this
lesson. THE KALMAN FILTER
Theorem 17-l
A PRELIMINARY RESULT a. The mean-squared filtered estimator of x(k + l), ii(k + Ilk + I), written
in predictor-corrector format, is
In our derivation of the Kalman filter, we shall determine that (17-11)
i(k + l(k + 1) = i(k + Ilk) + K(k + l)z(k + Ilk)
?(k + Ilk + 1) = i(k + Ilk) + K(k + l)i(k + Ilk) (17-8) fork = 0, 1,. . . , where %(0/O) = m,(O), and z(k + l/k) is the innovations
where K(k + 1) is an n x m (Kalman) gain matrix. We will calculate the process [E(k + Ilk) = z(k + 1) - H(k + l)j2(k + l(k)].
optimal gain matrix in the next section. b. K(k + 1) is an n x m matrix (commonly referred to as the Kalman gain
Here let us view (17-8) as the structure of an abitrary recursive linear matrix or weighting matrix) which is specified by the set of relations
filter, which is written in so-calledpredictor-corrector format; i.e., the filtered
estimate of x(k + 1) is obtained by a predictor step, x(k + Ilk), and a cor- K(k + 1) = P(k + llk)H(k + l)[H(k + l)P(k + llk)H(k + 1)
rector step, K(k + l)i(k + Ilk). The predictor step usesinformation from the + R(k + l)]- (1742)
state equation, because r2(k + ilk) = @(k + l,k)i(klk) + q(k + l,k)u(k). P(k + ilk) = @(k + l,k)P(kIk)W(k + l,k)
The corrector step usesthe new measurementavailableat tk + 1.The correction
+ I-(k + l,k)Q(k)I-(k + 1,k) (17-13)
is proportional to the difference between that measurementand its best pre-
State Estimation: Filtering (the Katman Filter) Lesson 17 Observations About the Kalman Filter
E{x(k + l)ii(k + l\k)} = mX(k + 1) 1. Figure 17-l depicts the interconnection of our basic dynamical system
+ Pti(k + 1,k + l/k)P;(k + l/k)i(k + l]k) (1746) [equations (15-17) and (U-18)] and Kalman filter system. The feedback
We define gain matrix K(k + 1) as nature of the Kalman filter is quite evident. Observe, also, that the
Kalman filter contains within its structure a model of the plant.
K(k + 1) = Pti(k + 1,k + l]k)Pg(k t ilk) (1747) The feedback nature of the Kalman filter manifests itself in fw~
Substituting (17-16) and (17-17) into (17-7) we obtain the Kalman different ways, namely in the calculation of i(k + Ilk + 1) and also in
filter equation (17-l 1). Because i(k + l]k) = @(k + l,k)%(kik) + the calculation of the matrix of gains, K(k + 1), both of which we shall
!Qk + 1,k)u(k), equation (17-l 1) must be initialized by i(O/O),which we explore below.
have shown must equal mx(0) [see Equation (16-K?)]. 2. The predictor-corrector form of the Kalman filter is illuminating from an
b. In order to evaluate K(k + 1) we must evaluatePGand Pi. Matrix PG information-usage viewpoint. Observe that the predictor equations,
has been computed in (X-33). By definition of cross-covariance, which compute Ei(k + ljk) and P(k + ilk), use information only from
the state equation, whereas the corrector equations, which compute
Pti = E{[x[k + 1) - mx(k + l)-&(k + l/k)} K(k + l), ji(k -+ l(k + I) and P(k + Ilk + l), use information only
= E{x(k + 1)2(k + ljk)} (1748) from the measurement equation.
because i(k + l]k) is zero-mean. Substituting (16-32) into this 3. Once the gain matrix is computed, then (17-11)representsa time-varying
expression,we find that recursive digital filler. This is seen more clearly when equations (16-4)
and (16-31) are substituted into (17-11). The resulting equation can be
Pti = E{x(k + l)%(k + lik)}H(k + 1) (1749) rewritten as
because E{x(k + l)v(k + 1)) = 0. Finally, expressing x(k + 1) as i(k + Ilk + 1) = [I - K(k + l)H(k + l)J@(k + l,k)li(k)k)
%(k + l]k) + $(k + l]k) and applying the orthogonahty principle
+ K(k + l)z(k + 1) (17-22)
(I3-15), we find that
+[I - K(k + l)H(k + l)]P(k + l,k)u(k)
Pti = P(k + #)H(k + 1) (17-20)
for k =0, l,.... This is a state equation for state vector ri, whose
Combining equations (17-20) and (16-33) into (17-l?), we obtain time-varying plant matrix is [I - K(k + l)H(k + l)]Ql(k + 1,k).
equation (1742) for the Kalman gain matrix. Equation (17-22) is time-varying even if our dynamical system in
Stateprediction-error covariancematrix P(k + l/k) was derived in equations (1517) and (15-B) is time-invariant and stationary, because
Lesson 16. gain matrix K(k + 1) is still time-varying in that case. It is possible,
Observations About the Kalman Filter 155
154
156 State Estimation: Filtering (the Kalman Filter) Lesson 17 Observations About the Kalman Filter 157
where we have omitted the temporal argumentson 0, I?. H, Q and R for 8. Because of the equivalence between mean-squared, best-linear un-
notational simplicity [their correct arguments are @(k + I&), biased, and weighted-least squaresfiltered estimates of our state vector
W + Lk): H(k), QM and R(k)]+ The matrix Riccati equation for x(k) (see Lesson 14) we must realize that our Kalman filter equations
P(k + l]k + 1) is obtained by substituting (17-E) into (17-14), and then are just a recursive solution to a systemof normal equations (seeLesson
(17-13) into the resulting equation. We leave its derivation as an 3). Other implementations of the Kalman filter that solve the normal
exercise. equations using stable algorithms from numerical linear algebra (see,
5. A measure of recursive predictor performance is provided by matrix e.g., Bierman, 1977) and involve orthogonal transformations, havebet-
P(k + l\k). This covariance matrix can be calculated prior to any ter numerical properties than (17-l l)-( 17-14). We reiterate, however,
processing of real data, using its matrix Riccati Equation (17-24) or that because this is a book on estimation theory, theoretical formulas
Equations (17-13) (17~14),and (17-12) A measure of recursive fiIter such as those in (17-l I)-( 17-14)are appropriate.
performance is provided by matrix P(k + lik + l), and this covariance 9. In Lesson 4 we developed two forms for a recursive least-squaresesti-
matrix can also be calculated prior to any processingof real data. Note mator, namely, the covariance and information forms. Compare
that P(k + l/k + I) + P(k + l/k). These calculations are often referred K(k + 1) in (17-12) with K,(k + 1) in (4-25) to see they havethe same
to as performance urr+es. It is indeed interesting tlrat the Kalman filter structure; hence, our formulation of the Kalman filter is often known as
utilizes a measure of its mean-squared error during its real-time the covariance formuiatiun. We leave it to the reader to show that
operation. K(k + 1) can also be computed as
6. Two formulas are available for computing P(k + lik + l), namely K(k + I) = P(k + Ijk + l)H(k + l)R-(k + 1) (17-25)
(17-14) which is known asthe standard form, and (17-g),which is known
as the stabilized form. Although the stabilized form requires more com- where P(k + Ilk + 1) is computed as
putations than the standardform, it is much less sensitiveto numerical P-(k + Ilk + 1) = P-(k + Ilk)
errors from the prior calculation of gain matrix K(k + 1) than is the
standard form. In fact, one can show that first-order errors in the calcu- + H(k + l)R-(k + l)H(k + 1) (17-26)
lation of K(k + 1) propagate as first-order errors in the calculation of When these equations are used along with (17-11) and (17-13)we have
P(k + l/k + l), when the standard form is used, but only as second- the information formulation of the Kalman filter. Of course, the order-
order errors in the calculation of P(k + Ilk + 1) when the stabilized ing of the computations in these two formulations of the Kalman filter
form is used. This is why (17-g) is called the stabilized form [for detailed are different. See Lessons4 and 9 for related discussions.
derivations, see Aoki (1967),Jazwinski (1970), or Mendel(l973)]. 10. In Lesson 14 we showed that &&kl/ZV) = EiMS(kljlV); hence,
7. On the subject of computation, the calculation of P(k + lik) is the most
costly one for the Kalman filter, becauseof the term @(k + l,k)P(k/k) hw(k lk) = hdk (k) (17-27)
Q(k + l,k), which entails two multiplications of two n x n matrices This means that the Kalman filter also gives MAP estimatesof the state
l i.e., P(k ik)@(k + l,k) and @(k + l,k)(P(kik)@(k + l,k))]. Total vector x(k) for our basic state-variable model.
computation for the two matrix multiplications is on the order of 2n3 11. At the end of the introduction section in this lesson we mentioned that
multiplications and 2n3 additiuns [for more detailed computation counts, z(k + 1) and Z(k + Ilk) are causally invertible. This meansthat we can
including storage requirements, for all of the Kalman filter equations, compute one from the other using a causal (i.e., realizable) system. For
see Mendel (1971), Gura and Bierman (1971), and Bierman (1973a)]. example, when the measurementsare available, then Z(k + Ilk) can be
One must be very careful to code the standard or stabilized forms obtained from Equations (17-23) and (16-31) which we repeat here for
for P(k + l/k + 1) so that ?zx Hmatrices are never multiplied. We leave the convenience of the reader:
it to the reader to showthat the standard algorithm can be coded in such
a manner that it only requires on the order of $ mn2 multiplications, li(k + Ilk) = @(k + l.k)[I - K(k)H(k)]i(klk - 1)
whereas the stabilized algorithm can be coded in such a manner that it + Z(k + l,k)u(k) + @,(k + l,k)K(k)z(k) (17-28)
only requires on the order of $mn2 multiplications. In many applications,
and
system order n is larger than number of measurementsrn? so that
rnn C= n3. Usually, computation is most sensitive to system order; so, ijk + ljk) = -H(k + l)ri(k + Ilk) t z(k i- I)
wheneverpssibk, use low-order (but adequate)models. k = 0, 1,. . . (17-29)
Lesson 17 Problems 159
State Estimation: Filtering (the Kalman Filter) Lesson 17
PROBLEMS 17-9. Show that the standard algorithm for computing P(k + l(k + 1) only requires on
the order of $ mn2 multiplications, whereas the stabilized algorithm requires on
17-l. Prove that %(k + ilk + 1) is zero mean, Gaussian and first-order Markov. the order of $ mrz2 multiplications (use Table 17-l). Note that this last result
Derive the recursive predictor form of the Kalman filter, given in (17-23). requires a very clever coding of the stabilized algorithm.
17-2.
17-3. Derive the matrix Riccati equation for P(k + Ilk + 1).
17-4. Show that gain matrix K(k + 1) can also be computed using (17-25).
17-5. Suppose a small error SK is made in the computation of the Kalman filter gain
K(k + 1).
(a) Show that when P(k + Ilk + 1) is computed from the standard form
equation, then to first-order terms,
6P(k + l[k + 1) = -SK(k + l)H(k + l)P(k + Ilk)
(b) Show that when P(k + Ilk + 1) is computed from the stabilized form
equation, then to first-order terms
6P(k + Ilk + 1) c 0
17-6. Consider the basic scalar system x (k + 1) = +x(k) + w(k) and z (k + 1) =
x(k + 1) + v(k + 1).
(a) Show that p(k + ilk) 2 4, which means that the variance of the system
disturbance sets the performance limit on prediction accuracy.
(b) Show that 0 5 K(k + 1) 5 1.
(c) Show that 0 5 p(k + Ilk + 1) : r.
17-7. An RC filter with time constant 7 is excited by Gaussian white noise, and the
output is measured every T seconds.The output at the sample times obeys the
Lesson 78
St&e Estimation:
Fi/tering Exampies
I
2 I I
I 7 I
INTRODUCTICNJ
2 \
Y I
I
In this lesson (which is an excellent one for self-study) we present five exam-
I
ples, which illustrate some interesting numerical and theoretical aspects of
Kalman filtering.
EXAMPLES
Example 1%1
In Lesson 17 we learned that the Kalman filter is a dynamical feedback system. Its gain
matrix and predicted- and filtering-error cuvariance matrices comprise a matrix
feedback systemoperating within the Kalman filter. Of course, these matrices can be
calculated prior to processing of data, and such calculations constitute a performance
mdysis of the Kalman filter, Here we examine the results of these calculations for two
second-order systems, III(z) = 1/(z - 1.322 + 0.875) and Hz(z) = 1/(z - 1.362 +
0.923). The second system is less damped than the first. Impulse responsesof both
systemsare depicted in Figure 18-l.
In Figure 18-l we also depict pII(k ik), p&kik), Kl(k) and h(k) versus k for both
systems.In both casesP(OiO)was set equal to the zero matrix.For system 1, CJ= 1 and
r = 5, whereas for system 2 q = 1 and r = 20. Observe that the error variances and
Kalman gains exhibit a transient response as well as a steady-stateresponse; i*e., after
a certain value of k (k s 10 for system-l and k = 15 for system-2)p,(kik), p&k!k),
ICI(&), and &(k) reaching limiting values, These limiting values do not depend on
P(OiO),as can be seenfrom Figure 18-2. The Kalman filter is initially influenced by its
initial conditions, but eventually ignores them. paying much greater attention to model
Examples
parameters and the measurements.The relatively large steady-statevalues for pll and
pz2are due to the large values of r. 0
Example 18-2
A state estimate is implicitly conditioned on knowing the true values for all system
parameters (i.e., @, r, v, H, Q, and R). Sometimes we do not know these values
exactly; hence, it is important to learn how sensitive (i.e., robust) the Kalman filter is to
parameter errors. Many references which treat Kalman filter sensitivity issuescan be
found in Mendel and Gieseking (1971) under category 2f, State Estimation:
Sensitivity Considerations.
Let 8 denote any parameter that may appear in @, r, q, H, Q, or R. In order to
determine the sensitivity of the Kalman filter to small variations in 6 one computes
&(k + Ilk + 1) /a& whereas, for large variations in 0one computes ti(k + l/k + 1) /
Ahe.An analysis of A%(k + Ilk + 1)/A& for example, reveals the interesting chain of
events, that: A?(k + l/k + l)/Ae depends on AK(k + 1)/A& which in turn depends
on AP(k + ilk) / A6, which in turn depends on AP(k Ik) / A0. Hence, for each variable
parameter, 8, we have a Kalman filter sensitivity system comprised of equations from
which we compute Ari(k + l]k + l)/A8 (see also, Lesson 26). An alternative to using
these equations is to perform a computer perturbation study. For example,
Observe that the sensitivity functions vary with time and that they all reach
steady-state values. Table 18-l summarizes the steady-state sensitivity coefficients
SK(k + 1) , sr(kIk) and Srk + l(k) .
8
Some conclusions that can be drawn from these numerical results are: (1)
K(k + l), P(klk) and P(k + l(k) are most sensitive to changesin parameter b, and
162
164 State Estimation: Filtering Examples Lesson 18 Examples 165
(2) p + 1) = pw for 8 = a, b , and q . This last observation could have been fore-
seen, because of our alternate equation for K(k + I) [Equation (17-Z)].
K(k + 1) = P(k + Ilk + l)H(k + I)R-(k + 1) (N-5) 0.2
This expression shows that if !I and r are fixed then K(k + 1) varies exactly the same -
-
way as P(k + Ilk + 1). -
-0.2
0.5
- 0.4 I 1 1 I I I I J
0 1 2 3 4 5 6 7 8 9 10
k
O-4 5%
5%
10% 0.8
t
0.3
0.6 - 20%
- 10%
I I I I I 1 t I -5%
02 J I + WC
0 12 3 4 5 6 7 8 9 IO 0.4 + 10%
+ 20%
k + 50%
0.2
0.0 I I t I I I 1 I I I
0 I 1 3 4 5 6 7 8 9 IO
50% k
1.3 20% (d)
10%
1.1 5%
5%
fO%
O-9
2 10%
+ 0.7 5%
2vf 5%
10%
02
20%
0.3
- 0.7
I I t I I I t I 1
0 1 2 -0.8 1 I I I I I I t I
3 4 5 6 7 8 9 IO
0 1 2 3 4 5 6 7 8 9 10
k
(e)
0ur example is adapted from Jazwinski (1970, pp. 302-303). We begin with the tions for i,(k + Ilk + 1) or x(k Ik), becausestate equation (18-17) contains no process
following simple first-order system noise.
Divergence is a large-sample property of the Kalman filter. Finite-memory and
x(k + 1) =x(k) + b (1845) fading memory filtering control divergence by not letting the Kalman filter get into its
large sample regime. Finite-memory filtering (Jazwinski, 1970)uses a finite window
z(k + 1) = x(k + 1) + V(k + 1) (18-16)
of measurements (of fixed length W) to estimate x(k). As we move from c = kl to
where b is a very small bias, so smail that , when we designour Kalman filter, we choose I = kl + 1, we must account for two effects, namely, the new measurement at
to neglect it. Our Kalman filter is based on the following model r = kl + 1 and a discarded measurement at f = kl - W. Fading-memory filtering, due
to Sorenson and Sacks (1971) exponentially ages the measurements, weighting the
recent measurement most heavily and past measurements much less heavily. It is
z(k 3- 1) = xm(k + 1) + u(k + 1) (18-18) analogous to weighted least squares, as described in Lesson 3.
Fading-memory filtering seems to be the most successfuland popular way to
Using this model we estimate x(k) and c&,,(k /k), where it is straightfoward to show that control divergence effects. q
Iqk + 1)
PROBLEMS
observe that, as k -+m, K(k + 1) -0 SO that im(k + lik + l)+&(kik). The Ka- 18-l. Derive the equations for &,,(k + Ilk + 1) and i(kjk), in (16-19) and (l&21),
lman filter is rejecting the new measurementsbecauseit believes (1%17) to be the true respectively.
model for x(k); but* of course, it is not the true model. 18-2. In Lesson 5 we described cross-sectional processing for weighted least-squares
The Kalman filter computes the error variance, pm(k ik), between im (k ik) and estimates. Cross-sectional (also known as sequential) processing can be
x,.,,(k). The true error variance is associatedwith Z(k ik), where performed in Kalman filtering. Suppose z(k + 1) = co! (z,(k + l), z,(k + I),
. . . , z,(k + l)), where zi(ll- + 1) = H,x(k + 1) + vj(k + l), vi(k + 1) are
i(k ik) = x(k) - im (k lk) (18-20)
mutually uncorrelated for i = 1, 2,. , , , zi(k + 1) is m, x 1, and
We leave it to the reader to show that ml + rnt + .*- -I- mq = me Let k,(k + ilk + 1) be a corrected estimate of
x(k + 1) that is associatedwith processing z,(k + I).
(a) Using the Fundamental Theorem of Estimation Theory, prove that a cross-
sectional structure for the corrector equation of the Kalman filter is:
i,(k -t Ilk + 1) = ri(k + ljk) + E{x(k + l)!&(k + ilk)}
As k-+ m, i(kjk) + m becausethe third term on the right-hand side of (18-21) diverges i,(k + Ilk + 1) = %(k + Ilk + 1) + E(x(k + l)Ii,(k + Ilk))
to infinity. This term contains the bias II that was neglected in the model used by
the Kalman filter- Note also that L, (k lk) = xm(k) - &(kik)+ 0 as k + m; thus, the f&(k + Ilk + 1) = k- j(k + ljk + 1) + E{x(k + l)&(k + l/k))
Kalman filter has locked onto the wrong state and is unaware that the true error = fi(k + l/k + 1)
variance is diverging.
A number of different remedies have been proposed for controlling divergence (b) Provide equations for computing E{x(k + l)lZ, (k t l/k)}.
effects, including: 18-3. (Project) Choose a second-order system and perform a thorough sensitivity
study of its associatedKalman filter. Do this for various nominal values and for
1. adding fictitious process noise, both small and large variations of the systems parameters. You will need a
computer for this project. Present the results both graphically and tabularly, as in
2. finite-memory filtering, and
Example 18-2. Draw as many conclusions as possible.
3. fading memory filtering.
Fictitious process noise, which appears in the state equation, can be used to
account for neglected modeling effects that enter into the state equation (e.g., trun-
cation of second-and higher-order effects when a nonlinear state equation is hnear-
iced, as described in Lesson 24). This processnoise introduces Q into the Kalman filter
equations, observe, in our first-order example, that Q does not appear in the equa-
Steady-State Kalman Filter 171
170
Single-Channel Steady-State Kaiman Filter
172 State Estimation: Kalman Filter and Wiener Filter Lesson 19
The steady-state Kalman filter can be viewed outside of the context of esti-
0 ..* ... 50 mation theory as a recursive digital filter. As such, it is sometimesuseful to be
1 70 0.933 4.67 able to compute its impulse response, transfer function, and frequency re-
7 24.67 o.s31 4.16
3
sponse.In this section we restrict our attention to the single-channelsteady-
24.14 0.829 4.14
4 24.14 0*828 4.14 state Kalman filter. From (19-11) we observe that this filter is excited by two
inputs, z(k + 1) and u(k) [in the single-channelcase, z(k + 1) and u(k)]. In
this section we shall only be interested in transfer functions which are associ-
The steady-state value of I? is obtained by setting p(k + l[k + 1) = ated with the effect of z(k + 1) on the filter; hence, we set u(k) = 0. Addi-
p (k ik) = pI in the last of the above three relations to obtain tionally, we shall view the signal component of measurement z(k) as our
desired output. The signai component of z(k) is hx(k).
Let Hi(z) denote the z-transform of the impulse responseof the steady-
state filter; i.e.,
Because pI is a variance it must be nonnegatke: hence, only the solution j& = 4.14 is
valid. Comparing this result with p (313)we see that the filter is in the steady state to H,(z) = !E{i(kjk) when z(k) = s(k)} (N-12)
within the indicated computationa accuracyaher processingjust three measurements.
In steady-state, the Kalman filter equation is This transfer function is found from the following steady-state fiZter system:
i(k + lik + 1) = i(+) + O.S28[z(k + 1) - i(k\k)] i(klk) = [I - iih]@i(k - Ilk - I) + h(k) (19-13)
(19-9)
= O.l72i(kik) + 0.828 z(k + 1) i(klk) = hi(klk) (19-14)
Observe that the filters pole at 0.172lies inside the unit circle. 0 Taking the r-transform of (19-13) and (19-14), it follows that
Hf (z) = h(1 - $z-I)- K (19-X)
Many ways have been reported for solving the algebraicRiccati equation
(19-l) [see Laub (1979), for example], ranging from direct iteration of the where
matrix Riccati Equation (17-24)until P(k + l!k) does not changeappreciably (19-16)
from P(k jk - I), to solving the nonlinear algebraic Riccati equation via an
iterative Newton-Raphson procedure, to solving that equation in one shot by Equation (19-15) can also be written as
the Schur method. Iterative methods are quite sensitive to error accumu- Hf(Z) = h,(O) + h,(l)z- + hf(2)2-? + *.* (19-17)
lation. The one-shot Schur method possessesa high degree of numerical
integrity, and appearsto be one of the most successfulwaysfor obtaining F. where the filters coefficients (i.e., Markov parameters), h, (j), are
For details about this method, seeLaub (1979). .-
h,(j) = h@! K (19-18)
In summary, then, to design a steady-stateKalman filter:
forj =O,l,....
1. given (@,r,V,H,Q,R), compute F, the positive definite solution of In our study of mean-squaredsmoothing, in Lessons 20,21, and 22, we
(19-l); will see that the steady-state predictor system plays an important role. The
steady-state predictor, obtained from (17-23), is given by
2. computez, as
ii(k + Ilk) = @&kIk - 1) + y,z(k) (19-19)
ii=Fw(~F~r + R)-l (19-10)
3. use (19-10) in
(I$, = @(I - Kh) (19-20)
k(k + lik + 1) = Wi(kik) + h(k) + %(k + Ilk) and
= (I - EH)@k(k !k) + itz(k + 1) (19-11)
y,=@E (19-21)
+ (I - KH)h(k]
State Estimation: Kalman Filter and Wiener Filter Lesson 19 Single-Channel Steady-State Kalman Filter 175
Let HP (z) denote the z-transform of the impulse responseof the steady- Table 19-2 summarizes 7 /r , r, C&and yP , quantities which are needed to compute the
state predictor, i.e., impulse response I+(k) for k 2 0, which is depicted in Figure 19-l. Observe that all
three responses peak at j = 1; however, the decay time for SNR = 20 is quicker than
H,(z) = Cif{f(klk - 1) when z(k) = 6(k)} (19-22)
TABLE 19-2 Steady-StatePredictor Quantities
This transfer function is found from the following steady-state predictor system: SNR p/r K 4 YP
%(k + Ilk) = @$(klk - 1) + y/J(k) (19-23) 20 20.913 1.287 0.064 0.910
5 5.742 1.049 0.183 0.742
i(klk - 1) = hk(klk - 1) (19-24) 1 1.414 0.586 0.414 0.414
Taking the z-transform of theseequations, we find that
H,(z) = h(I - $,z-)-z-yP (19-25)
which can also be written as 700
HP(z) = h,(l)z- + hP(2)z-2 + . . l (19-26)
where the predictors coefficients,hJj), are
hP(j) = hW;- yP (19-27)
SNR=20 (0) H,(z)=OA43z-+O.O41z-*
for j = 1,2,.... + o.003z-3
Although Hf(z) and H,(z) contain an infinite number of terms, they can
usually be truncated, becauseboth are associatedwith asymptotically stable
filters.
SNR = 5 (a) H,(z) = 0.524~- 1+ 0.096~-2
Example 19-2 + 0.008z-3+0.001z-4
In this example we examine the response of the steady-statepredictor
first-order system
(19-30)
Note that we could just as well have solved (19-l) for p /q but the structure of its
equation is more complicated than (19-30). The positive solution of (19-30) is
FJ Z(SNR-1)++v(SNR-1)+8SNR
- (19-31)
r
0 1 2 3 4 5 6 7 8
Additionally,
j
(19-32) Figure 19-1 Impulseresponse of steady-state predictor for SW? = 20, 5 and 1.
H,(z) is shown to three significant figures (Mendel, 1981, 0 1981, IEEE).
176 State Estimation: Kalman Filter and Wiener Filter Relationships Between Kalman Filter and FIR Wiener Filter
Lesson 19
(19-33)
and
-0.688~~ + 1.651~~ - 1.2212 + 0.25
f&(z) = z4 - 2.58@ + 24t89z2 - 1.033.? -io0.168
(N-34)
and
1
u
0
+ I.033 -2.489 +2.586
Figure 19-2depictsII~(Ic),/HI@)1 (in db) and LMI(jw) aswell asIzP,(k), iHP,(j~)[,
and LHP,(iu) for WR = 1, 5, and 20. Figure 19-3 depicts comparable quantities for SNR=5 H,(z)
the second system. observe that, as signal-to-noise ratio decreases, the steady-state
predictor rejects the measurements; for the amplitudes of hPI(k) and hpz(k) become
smaller as SNR becomessmaller. It also appears, from examination of /HPI(ju)/ and
]H,,2(jw)i, that at high signal-to-noise ratios the steady-stale predictor behaves like a
high-passfilter for system-1 and a bandpassfilter for system-2. 0n the other hand, at
low signal-to-noiseratios it appears to behave like a band-passfilter.
The steady-statepredictor appears to be quite dependent on system dynamics at
high signal-to-noise ratios, but is much less dependent on the system dynamics at low
signal-to-noise ratios. At iow signal-to-noise ratios the predictor is rejecting the mea-
surements regardlessof system dynamics.
Just becausethe steady-state predictor behaves like a high-pass fiiter at high
signal-to-noiseratios does not mean that it passesa lot of noise through it, becausehigh
signal-to-noise rasjo means that measurement noise level is quite low* Of course, a
spurious burst of noise would passthrough this filter quite easily. Cl Time Freq Freq
Figure 19-2 Impulse and frequency responsesfor system HI(z) and its associated
steady-statepredictor.
RELATIONSHIPS BETWEEN THE STEADY-STATE KALMAN FILTER
AND A FINITE IMPULSE RESPONSE DiGiTAL WIENER FILTER The truncated steady-stare Kalman filter can then be implemented as a finite-
impulse response(FIR) digital filter.
The steady-state Kahnan filter is a rewrsive digital filter with filter coefficients There is a more direct way for designinga FIR minimum mean-squared
equal to !zf(j),j = 0, I,. , a [see (19-H)]. Quite often h! (j)+
. . 0 for j 2 J! so error filter, i.e., a digital Wiener.fiZter. as we describenext.
that Hf (z) can be truncated, i.e., Consider the situation depicted in Figure 19-4. We wish to design digital
(19-35)
filter F(z)% coefficientsf (O),f (l), . . . ,f(~) so that the filters output, y (k), is
k&j = hj(O) + hj(l)P + *-* + h&qz-J
178 State Estimation: Kalman Filter and Wiener Filter Lesson 19 Relationships Between Kalman Filter and FIR Wiener Filter
88I.r+
c
vi z(k)
1
FIR digital Wiener filter. *
Hz(z) t-4 HJz)
I
Using the fact that
$1 1 J G
0;
~1L
0.00 64.00 I -tj4.00 0.00 64.00 -64.00 0.00 64.00 y(k) = f (k)*z(k) = 5 f (i)z(k - i) (19-37)
i =0
6c--
i 8
d
s
cc;
r
we see that
j =O,l,...,q (19-39)
iif(W&
=0 -j)= &d(j)
SNR=5
:p-
sI where &( . ) is the auto-correlation function of filter input z(k), and &( . ) is
the cross-correlation function between z (k) and d(k). Equations (19-39) are
8s.r known as the discrete-time Wiener-Hopf equations. They are a system of nor-
%i
0.00 64.00 -64.00
0.00
64.00 -64.00 0.00 64.00
mal equations, and can be solved in many different ways, the fastest of which
is by the Levinson Algorithm (Treitel and Robinson, 1966).
The minimum mean-squared error, I*(f), can be shown to be given by
86If-
0
SNR= 20 One property of the digital Wiener filter is that I*(f) becomes smaller as I,
the number of filter coefficients, increases. In general, I*(f) approaches a
nonzero limiting value, a value that is often reachedfor modest values of q.
51 I In order to relate this FIR Wiener filter to the truncated steady-state
0.00 64.00 I -64.00 0.00 64.00 Kalman filter, we must first assume a signal-plus-noisemodel for z(k), i.e.,
Time Freq Freq (see Figure 19.5),
Figure 19-3 Impulse and frequency responsesfor system l&(z) and its associated z(k) = s(k) + v(k) = h(k)*w(k) + v(k) (19-41)
steady-statepredictor.
where h (k) is the impulse responseof a linear time-invariant system, and, asin
close, in somesense,to a desired signal d(k). In a digital Wiener filter design, our basic state-variable model (Lesson 15), w(k) and v(k) are mutually un-
correlated (stationary) white noise processeswith variances q and r, re-
fmf(%-J(rl) are obtained by minimizing the following mean-squared
error spectively. We must also specify an explicit form for desired signal d(k). We
shall require that
W) = E{[d(k) - y (k)fz)= w*(k)) (19-36) d(k) = s(k) = h(k)*w(k) (19-42)
StateEstimation: Kalman Filter ard Wiener Filler Lessor7 19
Comparisons of Kalman and Wiener Filters 181
PROBLEMS
lesson
19-l. Derive the discrete-time Wiener-Hopf equations given in (19-39).
19-2. Prove that the minimum mean-squarederror, I*(f), given in (19-40), becomes
smaller as 7, the number of filter coefficients, increases. State Estimation:
2
19-3 Consider the basic state-variable model, x(k + 1) = - x(k) + w(k) and
z(k +l)=x(k+l)+v(k+l),
E(x 2(0)}=2.
where
ti
4=2, r=l, m,(O)=0 and Smoothing
(a) Specify i(OlO) and p (010).
(b) Give the recursive predictor for this system.
(c) Obtain the steady-statepredictor.
(d) Suppose z(5) is not provided (i.e., there is a gap in the measurementsat
k = 5). How does this affect p (615)and p (100/99)?
19-4. Consider the basic scalar system x(k + 1) = 4x(k) + w(k) and z (k + 1) =
x(k + 1) + v(k + 1). Assume that 4 = 0 and let l@p(kIk) = PI.
(a) Show that p1 = 0 and p1 = (42 - 1)r /42.
Which of the two solutions in (a) is the correct one when < l?
The literature on smoothing is f3led with many different approaches for M(kIk + 1) = P(kIk)W(k + l,k)H(k + 1)
deriving recursivesmoothers.By augmentingsuitably defined statesto (1517) [H(k -+ l)P(k + llk)H(k + 1) + R(k + l)]- (20-S)
and (15-l@, one can reduce the derivation of smoothing formulas to a Kalman
filter for the augmented state-variable model (Anderson and Moore, 1979). Proof. From the Fundamental Theorem of Estimation Theory and The-
The filtered estimatesof the newly-introduced states turn out to be equiv- orem 12-4, we know that
alent to smoothed values of x(k). We shall examine this augmentation ap- G(klk + 1) = E{x(k)/%(k + 1))
proach in Lesson21. A secondapproach is to use the orthogonality principle
to derive a discrete-time Wiener-Hopf equation, which can then be used to = %WkWMk + 1)) (20-9)
establish the smoothing formulas. We do not treat this approach in this book. = E(x(k)l%(k),Z(k + l(k)}
A third approach, the one we shall follow in this lesson, is based on the = E(x(k)lZ(k)} + E{x(k)li(k + Ilk)) - m,(k)
causal invertibility between the innovations process i(k -t ~IJc+ j - 1) and
measurementz(k + j), and repeated applicationsof Theorem 12-4. which can also be expressedas (see Corollary 13-l)
i(klk + 1) = fr(klk) + P;(k ,k + l]k)P;(k + llk)Z(k + Ilk) (20-10)
Defining single-stagesmoother gain matrix M(k Jk + 1) as
A SUMMARY OF IMPORTANT FORMULAS
M(kIk + 1) = P,(k .k + llk)P;;(k + ljk) (20-l 1)
The following formulas, which have been derived in earlier lessons, are used (20-10) reducesto (20-7).
so frequently in this lesson, as well as in Lesson21, that we collect them here Next, we must show that M(k\k + 1) can also be expressed as in (20-g).
for the convenienceof the reader: We already have an expression for Pii(k + l(k), namely (20-4); hence, we
State Estimation: Smoothing Lesson 20 Double-Stage Smoother
must compute P&k ,k + Ilk). To do this, we make useof (20-3), (20-5) and the k(klk + 1) = %(kjk) + A(k)[%(k + Ilk + 1) - i(k + Ilk)] (20-18)
orthogonahty principle, asfoliows:
Proof. Substitute (20-14) into (20-7) to seethat
P,(k,k + Ilk) = E{x(k)Z(k + Illi)}
k(k lk + 1) = i(klk) + A(k)K(k + l)Z(k + Ilk) (20-19)
= E{x(k)[H(k + l)%(k + l[k) + v(k + l)]}
= E{x(k)i(k + llk)}H(k + 1) but [see (l7-ll)]
= E{x(k)[@(k + l,k)x(k[k) K(k + l)Z(k + ilk) = S(k + Ilk + 1) - ri(k + Ilk) (20-20)
(20-12)
+ r(k + l,k)w(k)]}H(k + 1) Substitute (20-20) into (20-19) to obtain the desired result in (20-18). 0
= E{x(k)?(klk)}@(k + Z,k)H(k + 1) Formula (20-7) is useful for computational purposes, whereas(20-18) is
= E{[f(klk) + ?(klk)]ji(k Ik)}@(k + l,k)H(k + 1) most useful for theoretical purposes.These facts will become more clear when
we examine double-stage smoothing in our next section.
= P(kIk)Q,(k + J,k)H(k + 1) Whereas the structure of the single-stagesmoother is similar to that of
Substituting (20-12) and (20-4) into (20-19, we obtain (20-8). q the Kalman filter, we see that M(kIk + 1) does not depend on single-stage
smoothing error-covariance matrix P(klk + 1). Kalman gain K(k + 1), of
For future reference,we record the following fact, course, does depend on P(k + Ilk) [or P(kIk)]. In fact, P(klk + 1) does not
appear at all in the smoothing equations and must be computed (if one desires
E{x(k)%(k + Ilk)} = P(k)k)@(k + 1,k) (20-13) to do so) separately. We addressthis calculation in Lesson 21.
This is obtained by comparingthe third and last lines of (20-12).
Observethat the structure of the single-stagesmootheris quite similar to
that of the Kalman filter. The Kalman filter obtains r2(k + Ilk + 1) by adding DOUBLE-STAGE SMOOTHER
a correction that dependson the most recent innovations, i(k + Ilk), to the
predicted value of x(k + 1). The single-stagesmoother, on the other hand, Instead of immediately generalizing the single-stagesmoother to an hTstage
obtains i(klk + 1) by adding a correction that also dependson i(k + ljk), to smoother, we first present results for the double-stagesmoother. We will then
the filtered value of x(k). We seethat filtered estimatesare required to obtain be able to write down the general results (almost) by inspection of the single-
smoothed estimates. and double-stage results.
Coroliary 20-l. Kalman gain matrix K(k + 1) is a factor in M(klk + 1), Theorem 20-2. The double-stage mean-squared smoothed estimator of
i.e., x(k), %(k)k + 2), is given by the expression
M(kIk + 1) = A(k)K(k + 1) (20-14) ri(k(k + 2) = f(k)k + 1) + M(kIk + 2)2(k + 21k + 1) (20-21)
where where double-stage smoother gain matrix, M(klk + 2), is
A(k) A P(kIk)W(k + l,k)P-(k + Ilk) (20-15) M(kIk + 2) = P(kIk)W(k + l,k)[I - K(k + 1)
Proof. Using the fact that [Equation (17-12)j H(k + l)]@(k + 2,k + 1)
(20-22)
K(k + 1) = P(k + llk)H(k + 1) H(k + 2)[H(k + 2)P(k + 21k + 1)
[H(k + l)P(k + llk)H(k + 1) + R(k + l)]- (20-16) H(k + 2) + R(k + 2)]-
we seethat Proof. From the Fundamental Theorem of Estimation Theory, The-
H(k + l)[H(k + l)P(k + llk)H(k + 1) + R(k + l)]- orem 12-4, and Corollary 13-1,we know that
= P-(k + llk)K(k + 1) (20-17) i(klk + 2) = E{x(k)lS(k + 2))
When (20-17) is substitutedinto (20.8), we obtain (20-14). 0 = E{x(k)l%(k + l), z(k + 2))
(20-23)
\ ,
= E{x(k)lZ(k + l), i(k + 21k + 1))
Corollary 20-2. Another way to express ri(klk + 1) is = E{x(k)lZ(k + 1)) + E{x(k)li(k + 2Ik + 1)) - m,(k)
188 State Estimation: Smoothing Lesson Xl
Single- and Double-Stage Smoothers as General Smoothers 189
which can also be expressedas Using the definition of matrix .4(k) in (20-151,we see that (20-30) can be
?(kjk -+ 2) = k(kik + 1) expressedas in (20-26). [7
+ PG(k,k + 2ik + l)P&(k + 2ik + l)i(k + 2ik + I) (20-24)
Corollary 20-4. Trvo other ways to express %(klk + 2) are:
Defining double-stagesmoother gain matrix M(k\k + 2) as
i(klk + 2) = i(klk + 1)
M(kik + 2) = Pti(k ,k + 2/k + l)Pg(k + 2ik + 1) (20-25) + A(k)A(k + l)[i(k + 21k + 2) - i(k + 21k + I)] (20-31)
(20-24) reduces to (20-21). and
In order to show that M(k/k + 2) in (20-25) can be expressed as in
i(klk + 2) = i(klk) + A(k)[%(k + l/k + 2) - i(k + Ilk)] (20-32)
(2O-22),one proceedsas in our derivation of M(k !k + 1) in (20-12); however,
the details are lengthier becauseii(k + 2ik + 1) involves quantities that are Procf. The derivation of (20-31) follows exactly the same path as the
two time units awayfrom x(k), whereas i(k + lik) involves quantities that are derivation of (20-H), and is therefore left as an exercise for the reader.
only one time unit awayfrom x(k). Equation (20-13) is used during the deri- Equation (20-31) is the starting place for the derivation of (20-32). Observe,
vation. We leave the detailed derivation of (20-22) as an exercise for the from (20-18), that
reader. q
A(k + l)[i(k + 21k + 2) - ji(k + 2/k + I)]
Whereas (20-22) is a computationally useful formula, it is not useful = G(k + Ilk + 2) - i(k + Ilk + 1) (20-33)
from a theoretical viewpoint; i.e., when we examine M(k jk + 1) in (20-8) and thus, (20-31) can be written as
M(k!k + 2) in (20-22),it is not at al1obvious how to generalizethese formulas
to hS(k[k + N), or evento M(kik + 3) The following result for M(kik + 2) is ji(klk + 2) = i(k jk + 1)
easily generalized. + A(k)[ri(k + Ilk + 2) - i(k + Ilk + l)] (20-34)
Substituting (20-18) into @O-34),we obtain the desired result in (20-32). III
Corollary 20-3, Kalman pin mcztrix K(k + 2) isa factor in M(kik + 2),
i.e., The alternate forms we haveobtained for both i(klk + 1) and f(klk + 2)
M(kik + 2) = A(k)A(k + l)K(k -+2) (20-26) will suggesthow we can generalizeour single- and double-stagesmoothers to
N-stage smoothers.
where A(k) is defined in (20-15).
Pruuf. Increment k to k + 1 in (2047) to seethat
SINGLE- AND DOUBLE-STAGE SMOOTHERS
H(k + 2)[H(k + 2)P(k + 2ik + l)H(k + 2) + R(k + 2)]- AS GENERAL SMOOTHERS
= P-l (k + 2lk + I)K(k + 2) (2U-27)
At the beginning of this lessonwe describedthree types of smoothers, namely:
Next, note from (17-14),that
fixed-interval, fixed-point, and fixed-lag smoothers. Table 20-l showshow our
I - K(k + l)H(k + 1) = P(k + l/k + l)P- (k + lik) (20-28) single- and double-stage smoothersfit into these three categories.
In order to obtain the fixed-interval smoother formulas, given for both
hence,
the single- and double-stagesmoothers, set k + 1 = N in (20-18)and k + 2 =
[I - K(k + l)H(k + l)] = I-(k + llk)P(k + l[k + 1) (20-29) N in (20-32), respectively. Doing this forces the left-hand side of both equa-
tions to be conditioned on data length N. Observe that, before we can com-
Substitute (2U-27)and (20-29)into (20-22) to seethat pute %(N - IIN) or i(N - 2jN), we must run a Kalman filter on all of the data
M(kik + 2) = P(klk)@(k + 1,k)P- (k + lik) in order to obtain li(NIN), This last filtered state estimate initializes the back-
ward running fixed-interval smoother. Observe, also, that we must compute
P(k + ilk + l)W(k + 2,k -+ 1)
i(N - l/N) before we can compute Ei(N - 21N).Clearly, the limitation of our
P-* (k + 2lk + l)K(k + 2) (20-30) results so far is that we can only perform fixed-interval smoothing for
s TABLE 20-l Smoothing Interrelationships
0
Single-Stage Double-Stage
Fixed-Interval f (N - IIN) = & (N - l(N - 1) + A(N - 1) fi (N - 21N) = 2 (N - 21N - 2) + A(N - 2)
I WIN, - &(NlN - l)] [B(N - IIN) - B(N - 11N- 2)]
n(klk + 1) Smoother
F y Time
i Scale
So!ution proceedsin forward time from k Solution proceedsin forward time. Results
to k + 1 on the filtering time scale and then from single-stagesmooth_eras well as optimal
back to $ on the smoothing time scale. A filter_arerequired at Y=k + 1, whereasat
one-unit time delay is present. Y= k only results from optimal filter are
required. A two-unit time delay is present.
Single-Stage Double-Stage
Fixed-Lag %(klk + I) = ir(klk) + A(k) R(klk + 2) = Qklk + 1)+ A(k)A(k + 1)
[fi(k + Ilk + 1)-ii(k + Ilk)] [jZ(k + 21k + 2) -f(k + 21k + l)]
k variable k variable
1 2 3
I I II *
I
IJI
Window Window Window for jz;(113)
for n( 112) for %(2]3)
I
Window for si(214)
792 State Estimation: Smoothing LE?sson 20
PROBLEMS INTRODUCTION
20-l. Derive the formula for the double-stagesmoother gain matrix M(k ik + 2), given In Lesson 20 we introduced three general types of smoothers, namely, fixed-
in (20-22). interval, fixed-point, and fixed-lag smoothers,We also developed formulas for
20-2, Derive the alternative expression for k(kik + 2) given in (20-31). single-stage and double-stage smoothers and showed how these specialized
20-3. Using the FundamentaI Theorem of Estimation Theory, derive expressionsfor smoothers fit into the three general categories.In this lesson we shall develop
G(k jk + I) and G(k ik + 2). These represent single- and double-stage estimators general formulas for fixed-interval, fixed-point, and fixed-lag smoothers.
of disturbance w(k).
FIXED-INTERVAL SMOOTHERS
where and
M(klN) = P,(k,NIN - l)P,(NlN - 1) (21-3) jz(N - 2/N) = i(N - 21N - 2)
and then that + A(N - 2)[i(N - l[N) - f(N - 1iN - 2)] (21-9)
M(kjN) = A(k)A(k + 1). . .A(N - l)K(N) (21-4) Observe how i(N - 11N)is used in the calculation of i(N - 21N).
Equation (21-6) was developed by Rauch, et al. (1965).
and, finally, that other algorithms for ri(klN) are
ri(klN) = i(k(N
[ 1
- 1) + ijA(i)
i=k
[i(N(N) - rZ(NIN - l)] (21-5)
Theorem 21-l. A useful mean-squared fixed-interval
mator of x(k), i(klN), is given by the expression
?(klN) = i(k)k) + A(k)[k(k
smoothed esti-
Additionally, x(k + 1) = i(k + IIN) + G(k + l[ZV).so that ~(010) is an initial condirion that is data independent, whereas ~(014) is a result of
processing r(l), z(2), f (3), and z (4). In essence, fixed-interval smoothing has let us
Px(k + 1) = IQk + 1iN) + P(k + l/N) (21-B) look into the future and reflect the future back to time zero.
Equating (2147) and (2148) we find that Finally, note that, for large values of k , A(k) reaches a steady-statevalue, x,
where in this example
Pti(k + Ilk) - Pti(k + liw = P(k + 1iA) - P(k + l/k) (2149)
x = g@h + 20) = 0.171 (21-23)
Finally, substituting (2149) into (21~16),we obtain the desired expression for
This steady-state value is achieved for k = 3. 0
P(qQ (21-11). EI
Equation (21-10) requires the multiplication of 3 n X n matrices as well
We leave proof of the fact that {%(k~N), k = N, N - 1, . . . , O} is a as a matrix inversion at each iteration; hence, it is somewhat limited for
zero-mean secund-order Gauss-Markov process as an exercise for the reader.
practical computing purposes.The following results, which are due to Bryson
Example 21-l and Frazier (1963) and Bierman (1973b), represent the most practical way for
In order to illustrate fixed-interval smoothing and obtain at the same time a corn- computing x(kIN) and also P(klN).
parison of the relative accuraciesof smoothing and filtering, we return to Example
19-l. To review briefly, we have the scaIar system x(k + 1) = x(k) +- IV(~) with the Theorem 21-2. (a) A useful mean-squared fixed-interval smoothed
scalar measurement z(k + I) = x(k + 1) + v(k + 1) and P(0) = Xl, q = 20. and estimator @x(k), %(k]N), is
r = 5. In this example (which is similar to Example 6.1 in Meditch 1969, pg. 225) we
choose N = 4 and compute quantities that are associated with i(ki4), where from i(kpv) = i(k lk - 1) + P(k ik - l)r(k jN) (21-24)
(21-10) where k = N - 1, N-2,..., 1, and n x 1 vector r satisfies the backward-
Z(k[4) = i(kik) -+ A (k)[i (k + 114) - i(k + l/k)] (21-20) recursive equation
k = ?, 2, 1,O. Because @ = 1 andp{k + ltk) = P(kik) + 20 r(jlN) = @Ji + L.i)r(j + 1IN)
P (k IkI + H(j)[H(j)P(jIj - l)H(i) + R(j)]-Z(jb - 1) (21-25)
A(k) = P(kfk)@p -(k + ljk) = (21-21)
p(kik) t 20 where j = N, N - 1,. . . ,1 and r(N + lb9
and, therefore, (b) The smoothing error-covariance matrix, $iN), is
P(k IN) = P(k/k - 1) - P(klk - l)S(k [N)P(k Ik - 1) (21-26)
where k = N - 1, N - 2, . . . , 1, and n X n matrix S( jjN), which is the covar-
Utilizing these last two expressions,we compute A (k) andP (k 14)for k = 3,&l, iunce matrix of r( j IN), satisfies the backward-recursive equation
0 and present them, along with the results summarized in Table 19-1,in Table 21-l. The
three estimation error variancesare given in adjacent cohunns for easein comparison. S(jIN) = Qp(j + l,j)S(j + lIN)$ (j + 1,j)
+ wmwP(~Ij - W(i) + wH-a(i) (21-27)
TAl3LE 21-l Kalman Filter and Fixed-interval Smoother Quantities
where j = N, N - 1, . . . , 1 and S(N + l/N) = 0. Matrix Qp is defined in
(21-33).
0 50 16.31 0.714
1 iI- 4.67 3.92 i9.3-3 0.189 Proof. (Mendel, 1983b, pp. 64-65). (a) Substitute the Kalman filter
2 24.67 4.16 3.56 0.831 0.172 equation (1741) for i(klk) into Equation (21~lo), to show that
3 24.14 4.14 3.74 0.329 0.171
4 24.14 4.14 4.14 0.828 0.171 i2(kIN) = i(klk - 1) + K(k)Z(klk - I>
+ A(k)[i(k + 1IN) - i(k + Ilk)] (21-28)
Observe the large improvement (percentage wise) of JJ(ki4) over p (k !k). Im- Residual state vector r(k IN) is defined as
provement seemsto get larger the farther away we get from the end of our data; thus,
P (014) is more than three times as small as p(O/Oj.Of course. it should be, because r(k IN) = P-(klk - l)[i(k IN) - i(klk - I)] (21-29)
State Estimation: Smoothing (General Results) Lesson 21
Fixed-Point Smoothing 199
Next, substitute r(kIN) and r(k + l[N), using (21-29) into (21-28) to show tor which is excited by the innovations-one which is running in a backward
that direction.
r(kIN) = P-(k/k - l)[K(k)i(k[k - 1) Finally, note that (21-24) can also be used for k = N, in which case its
+ P(k Ik)W(k + l,k)r(k + l/N)] (21-30) right-hand side reduces to %(NlN - 1) + K(N)Z(NIN - l), which, of course,
is i((NIN).
From (17-12) and (17-13)and the symmetry of covariancematrices, we
find that
FlXED-POINT SMOOTHING
I-(klk - l)K(k) = H(k)[H(k)P(klk - l)H(k) + R(k)]- (21-31)
and A fixed-point smoother, i(kl j) where j = k + 1, k + 2, . . . , can be obtained
P-(klk - l)P(k)k) = [I - K(k)H(k)] (21-32) in exactly the same manner aswe obtained fixed-interval smoother (21-5). It is
obtained from this equation by setting N equal to j and then letting j = k + 1,
Substituting (21-31) and (21-32)into Equation (21-30), and defining k + 2,. . . ; thus,
Op(k + 1,k) = <P(k+ l,k)[I - K(k)H(k)] (21-33) i(klj) = r2(klj - 1)+ WMjlj) - %jlj - l>l (21-38)
we obtain Equation (21-25). Setting k = N + 1 in (21.29), we establish where
r(N + 1IN) = 0. Finally, solving (21-29) for i(kIN) we obtain Equation j-l
(21-24). B(j) =
A(i) II (21-39)
(b) The orthogonality principle in Corollary 13-3leads us to conclude that =k j
E{%(kIN)r(k IN)} = 0 (21-34) and j = k + 1, k + 2,. . . . Additionally, one can show that the fixed-point
smoothing error-covariance matrix, P(kl j), is computed from
because r(kIN) is simply a linear combination of all the observations z(l),
z(2),
z(N). From (21-24)we find that
l . 9
Wlj) = Wlj - 1)+ BbW(A.0- Wlj - WW) (21-40)
is the covariance-matrix of r(klN) [note that r(kIN) is zero mean]. Equation Theorem 21-3. A most useful mean-squared fixed-point smoothed esti-
(21-36) is solved for P(klN) to give the desired result in Equation (21-26). mator of x(k), %(klk + 4 where 1 = 1,L**, is given by the expression
Because the innovations process is uncorrelated, (21-27) follows
from substitution of (21-25) into (21-37). Finally, S(N + 1IN) = 0 because i(klk + I) = i(klk + Z - 1)
r(N + 11N) = 0. 0 + N,(klk + Z)[z(k + I) - H(k + Z)r2(k + Ilk + Z - l)] (21-41)
Equations (21-24) and (21-25) are very efficient; they require no matrix where
inversions or multiplications of 12x y1matrices. The calculation of P(k IN>
does require multiplications of n x n.matrices. N,(klk + I) = S,(k,Z)H(k + I)
Matrix (I+-,(k + 1,k) in (21-33)is the plant matrix of the recursivepredic- [H(k + I)P(k + Ilk + Z - l)H(k + 2) + R(k + Z)]- (21-42)
tor (Lesson 16). It is interesting that the recursive predictor and not the
and
recursive filter plays the predominant role in fixed-interval smoothing. This is
further borne out by the appearanceof predictor quantities on the right-hand 9,(k,Z) = 9,(k,Z - 1)
side of (21-24). Observe that (21-25) looks quite similar to a recursive predic- [I - K(k + Z - ljH(k + I - l)]W(k + Z,k + 2 - 1) (21-43)
State Estimation: Smoothing (Generd Fksults) Lesson 21 Fixed-Lag Smoothing
Equutions (21-#I) and (21-43) are initialized by %(kik) and !Z&(k,l) = variable model. Anderson and Moore (1979)give these equations for the
P(kik)@(k + 1,k), respe&ely. Additio&ly, recursive predictor; i.e., they find
P(klk + 2) = P(kjk + 1 - 1) - N#/k + I)[H(k + Z)P(k + Zik + 2 - 1)
H(k + Z) + R(k + Z)]N;(kik + Z) (21-44)
which is initialized by P(kik). III
where k ~j. The last equality in (21-49) makes use of (21-46) and
We leavethe proof of this useful theorem, which is similar to a result (21-45). Observe that i(jlk), the fixed-point smoother of x(j), has been
given by Fraser (1967), as an exercisefor the reader.
found as the second component of the recursive predictor for the aug-
mented model.
Example 21-2 2. The Kalman filter (or recursive predictor) equations are partitioned in
Here we consider the problem of fixed-point smoothing to obtain a refined estimate of order to obtain the explicit structure of the algorithm for ri(jlk). We
the initial condition for the system described in Example 21-l. Recall that JJ(010)= 50 leave the details of this two-step procedure as an exercise for the reader.
and that by fixed-interval smoothing we had obtained the result p (014) = 16.31, which
is a significant reduction in the uncertainty associatedwith the initial condition.
Using Equation (21-40) or (21-44) we compute ~1(0/1),~1(0/2),and ~(013) to be FIXED-l&G SMOOTHING
16.69, l&32? and 16.31, respectively. Observe that a major reduction in the smoothing
error variance occurs as soon as the first measurementis incorporated, and that the The earliest attempts to obtain a fixed-lag smoother i(kjk + L) led to an
improvement in accuracy thereafter is relatively modest. This seems to be a general algorithm (e.g., Meditch, 1969), which was later shown to be unstable (Kelly
trait of fixed-point smoothing. IJ
and Anderson, 1971).The following state augmentation procedure leads to a
stable fixed-interval smoother for li(k - L Ik).
Another way to derive fixed-point smoothing formulas is by the follow- We introduce t + 1 state vectors, as follows: xl(k + 1) = x(k),
ing state augmentatiun procedure (Anderson and Moore, 1979). We assume xz(k + 1) = x(k - l), xj(k + 1) = x(k - 2), . . . , xL + I(k + 1) = x(k - L)
that for IL 2 j [i.e., xi(k + 1) = x(k + 1 - i), i = 1, 2,. . . , L + 13.The state equations for
W) A x(j) (21-45) these L + 1 state vectors are
The state equation for state vector x0(k) is xl(k + 1) = x(k)
xz(k + 1) = x,(k)
x,,(k + 1) = xG(k) k + (21-46)
xj(k + 1) = x,(k) (X-50)
It is initialized at k = j by (21-45). Augmenting (21-46) to our basic state-
variable model in (15-17) and (S-18), we obtain the following augmented xL+l(k + ;; = XL(k) I
basic state-variable mod&
Augmenting (21-50) to our basic state-variable model in (15-17) and (15-18),
we obtain yet another augmented basic state-variable model:
1. Write down the KaIman filter equationsfor the augmented basic state-
202 State Estimation: Smoothing (General Results) Lesson 21
Lesson 21 Problems 203
The following two-stepprocedure can be usedto obtain an algorithm for Fixed-Lag Smoothing, derive the resulting fixed-lag smoother equations. Show,
k(k - 1 IL\* by means of a block diagram, that this smoother is stable.
21-7. (Meditch, 1969, Exercise 6-13, pg. 245). Consider the scalar systemx(k + 1) =
1. Write down the Kalman filter equations for the augmentedbasic state- 2-kx(k)+w(k),z(k+1)=x(k+1),k=O,l,...,wherex(0)hasmeanzero
variable model. Anderson and Moore (1979)give theseequations for the and variance U& and w(k), k = 0, 1, . . . is a zero mean Gaussianwhite sequence
recursive predictor; i.e., they find which is independent of x(0) and has a variance equal to 4.
(a) Assuming that optimal fixed-point smoothing is to be employed to determine
E{col (x(k + 1),x1@+ l), . . . ,xL + I@ + l)f%(k)}
x(Olj), j = 1, 2, . . . , what is the equation for the appropriate smoothing
= co1(r2(k + l(k),i@ + l[k), . . . ) IiL +I(k + 1Jk)) (21-52) filter?
= co1(f(k + llk),i(klk),i(k - l/k), . . . f i(k - Llk)) (b) What is the limiting value of p (01j) asj + w?
The last equality in (21-52) makes use of the fact that xi(k + 1) = (c) How does this value compare with p (OIO)?
x(k + 1 - i), i = 1,2,. . . , L + 1.
2. The Kaiman filter (or recursive predictor) equationsare partitioned in
order to obtain the explicit structure of the algorithm for ji(k - Llk).
The detailed derivation of the algorithm for i(k - LIk) is left as an
exercise for the reader (it can be found in Anderson and Moore, 1979,
pp. 177-181).
PROBLEMS
21-1. Derive the formula for x(kIN) in (21-5) using mathematical induction. Then
derive i(klN) in (21-6).
21-2. Prove that {%(kIN), k = N, IV - 1, . . . , 0) is a zero-mean,second-order Gauss-
Markov process.
21-3. Derive the formula for the fixed-point smoothing error-covariance matrix,
P(kl j), given in (21-40).
21-4. Prove Theorem 21-3, which gives formulas for a most useful mean-squared
fixed-point smoother of x(k), i(klk + l), I = 1,2,. . . .
21-5. Using the two-step procedure described at the end of the section entitled
Fixed-Point Smoothing, derive the resulting fixed-point smoother equations.
21-6. Using the two-step procedure described at the end of the section entitled
Minimum-Variance Deconvolution (MVD) 205
INTRODUCTION
Theorem 22-2 (Mendel, 1983a,pp. 68-70)
In this lesson we present some applications that illustrate interesting numer-
ical and theoretica aspectsof fixed-interval smoothing. Theseapplications are a. A two-puss fixed-interval smoother for I is
taken from the field of digital signa processing.
Here, as in ExampIesZ-6and 14-1, we begin with the convolutional model ~6 (klhr) = q (4 - q WyS(k + 1IN)yq (k) (22-6)
z(k) = 2 p(i)h (k - i) + v(k), k =1,2,...,N (22-l) where k = N-l,N-2,..., 1. In these formulas r(klN) and S(klN) are
i=l computed using (21-25) and (21-27), respectiveZy, and E{p(k)) = q(k)
Recall that deconvolution is the signal processingprocedure for removing the [here q(k) d enotesthe variance of p(k), and should not be confusedwith
effects of !z(j) and v(j) from the measurements so that one is left with an the event sequence;which appearsin the product model for p(k)].
estimate of p(j), Here we shahobtain a useful algorithm for a mean-squared
fixed-interval estimator of p ( j). Proof a. To begin, we apply the fundamental theorem of estimation
To begin, we must convert (22-l) into an equivalent state-variable theory, Theorem 13-1,to (22-2). We operate on both sidesof that equa-
model. tion with E{ . I%(N)}, to show that
Theorem 22-l (Mendel, 1983a, pp. 13-14). The single-channel Jtatea yb(klN) = li(k + 1IN) - @ri(kIN) (22-7)
varia b/e mudei
By performing appropriate manipulations on this equation we can derive
x(k + I) = @x(k) + Tp(k) (22-Z)
(22-5) as follows. Substitute ir(klN) and li(k + 1jN) from Equation
z(k) = hx(k) + r(k) (22-3) (21-24) into Equation (22-7), to see that
204
State Estimation: Smoothing Applications Lesson 22
Steady-State MVD Filter 207
to form For a time-invariant IR and stationary noises,the Kalman gain matrix, as well
P(k)= WIN) + 4Wr(k + l!N) (22-16) as the error-covariance matrices, will reach steady-state values. When this
Taking the variance of both sidesof (22-16),and using the orthogonality occurs, both the Kalman innovations filter and anticausal p-filter [(22-5) and
condition (21.25)] become time invariant, and we then refer to the MVD filter as a
steady-state MVD filter. In this section we examine an important property of
wwN)r(k + WH = 0 (22-17) this steady-statefilter.
208 State Estimation: Smoothing Applications Lesson 22 Steady-State MVD filter
0.60
0.30
0.00
-0.30
- 0.60
- 0.90
0.00 60.0 120.0 ISO* 240.0 3ou.o
(msecs)
(4
20.0 [
16.0
12.0
8.0
4.0
I 1 I
0.0
0.00 0.80 I.60 2.40 3.20 4.00 0.00 0.80 1.60 2.40 3.20 4.00
(radians)
(b)
Figure 22-l (a) Fourth-order broad-band channel IR, and (b) its squared ampli- Figure 22-2 (a) Fourth-order narrower-band channel, IR, and (b) its squared
tude spectrum (Chi, 1983). amplitude spectrum (Chi and Mendel, 1984, 0 1984,IEEE).
210 State Estimation: Smoothing Applications Lesson 22
Steady-State MVD Filter 211
0.20 -
0
0.06 0.10 -
0 0
- 0.06 0
-0.12
0
.
0.10
0.20
0
0
0.05 0 0
0.10
0
0
0 0
0
0.00
0.00
0 0
-0.05 ,.,,,,,
-0.10 - 0
-0.10 -0.20 -
0 0
1 I I I
0.00 150 300 450 750 0.00 150 300 450 750
(msecs) (msecs)
(b) (b)
Figure 22-3 Measurements associatedwith (a) broad-band channel (SAN = 10) Figure 22-4 b&IN) for (a) broadband channel (SNR = 10) and (b) narrower-
and (b) narrower-band channel (SNR = NO), (Chi and Mendel, 1984, 0 1984, band channel (SNR = 100). Circles depict true p(k) and bars depict estimate of
IEEE). p(k), (Chi and Mendel, 1984,O 1984,IEEE).
212 St&e Estimation: Smoothing Applications Lesson 22
Reiationship Between MVD Filter and IIR Wiener Deconvolution Filter 213
Let /z#) and h&(k) denote the IRs of the steady-stateKalman inno- We leave the proof of this theorem as an exercisefor the reader. Observe
vations and anticausa1p-filters, respectively. Then, that part (b) of the theorem means that bs(kjiV) is a zero-phase wave-shaped
b(kpl) = hJk)*j(kik - 1) version of p(k). Observe, also, that R(w) can be written as
= hp(k)*hi(k)*z (k) lw4124 /r
(22-21) R(w) = (22-31)
= hJk)*hi(k)*h (k)*p(k) 1 + p(w)t2q /r
+ hK(k)*hi(k)*v(k) which demonstrates that q /r , and subsequently SNR, is an MVD filter tuning
which can also be expressedas parameter. As 4 / t + DC,R(o)* 1 so that R(k) + s(k); thus, for high signal-
to-noise ratios fis(klN)+ p.(k). Additionally, when /H(w)12q/r >> 1,
Lwv = lwlxl + N+Yl (22-22) R(w)+ 1, and once again R(kj -+ 6(k). Broadband IRS often satisfy this
where the sigml componentof ,il(k IA/),fis(k IN),is condition. In general, however, &(klN) is a smeared-out version of I;
however, the nature of the smearing is quite dependent on the bandwidth of
fis(kIW z=h~(k)*hi(k)~h(k)~~~k) (22-23) h(k) and SNR.
and the noise componentof b(kiq, n (k/N), is Example 22-2
n(kiN) = hp(k)*hi(k)*v(k) (22-24) This example is a continuationof Example 22-1. Figure 22-5 depicts R (k) for both the
broadband and narrower-band IRS. hi(k) and hz(k), respectively. As predicted by
We shall refer to hp(k)*hi (k) as the IR of the MVD filter, h&k), i.e., (22-31),RI(~) is much spikier than R,(k), which explains why the MVD results for the
broadband IR are quite sharp, whereas the MVD results for the narrower-band IR are
h@-J = h,.G)*hi W (22-25) smeared out. Note, also, the difference in peak amplitudes for R,(k) and R,(k). This
The following result has been proven by Chi and Mendel (1984) for the explains why b(kfN) un derestimatesthe true values of p(k) by such large amounts in
slightly modified model x(k + l,l = @x(k) + yp(k + 1) and z(k) = hx(k) + the narrower-band case (see Figs.22-4a and b). E3
v (k) [becauseof the p(k + 1) input instead of the p(k) input, h (0) + 01.
RELATlONSHlP BETWEEN STEADY-STATE MVD FILTER
Theorem 22-3, AND AN INFlNtTE IMPULSE RESPONSE DIGITAL WIENER
DECONVOLUTION FILTER
a. The Fourier transform of hMv(k) is
We have seen that an MVD filter is a cascadeof a causal Kalman innovations
(22-26) filter and an anticausal p-filter; hence, it is a noncausal filter. Its impulse
response extends from k = --3c to k = +=, and the IR of the steady-state
where H*(U) denotesthe compiex conjugateuf H(u); and MVD filter is given in the time-domain by hMv(k) in (22-23, or in the fre-
b. the signd componentof &(kiN), bS(kiN), ti given by quency domain by HMv(w) in (22-26).
fis(k INI = I3 w *w (22-27) There is a more direct way for designing an IIR minimum mean-squared
error deconvolution filter, i.e., an IIR digital Wiener deconvolution filter, as
where R(k) ti the auto-correlatbz function we describe next.
We return to the situation depicted in Figure 19-4, but now we assume
WI = t[h(k)*hi(k)]*[h(-k)*hi(*k)] (22-28) that: Filter F(z) is an IIR filter, with coefficients (f(j), j = 0, 21, 52, . . . );
in which d(k) = CL(k) (22-32)
where p(k) is a white noisesequence;p(k), v(k), and n (k) are stationary; and,
v = hFh + r (22-29)
p(k) and v(k) are uncorrelated.
additionally, In this case, (19-39) becomes
(22-30) C f (i)&(i - 1) = 4+(j), j = 0, -tL 56 . . . (22-33)
iz -3
Maximum-Likelihood Deconvoiution 215
214 State Estimation: Smoothing Applications Lesson 22
Using (22-l), the whiteness of p(k) and the assumptions that p(k) and v(k)
0.80
r are uncorrelated and stationary, it is straightforward to show that
42,(i) = qh(71 (22-34)
0.60 Substituting (22-34) into (22.33), we have
Theorem 22-4 (Chi and Mendel, 1984). The steady-state MVD filter,
whose IR is given by hMv(k), is exactly the same as Berkhouts IIR digital
Wiener deconvolution filter. Cl
CTptyN- I)Q$(N - 1) + ~1, that must be inverted. The following theorem begin, we must obtain the following state-variablemodels for h(k) and d(k)
provides a more practical way to compute &&!IC; 4). [i.e., we must use an approximate realization procedure (Kung, 1978) or any
other viable technique to map (h(i), i = 0, 1,. . . , 1,) into {a,, yI, h,}, and
Theorem 22-5 (Mend& 1983b). unconditionul maximum-likelihuod {d(i), i = 0, 1,. . . ) &I into {@2,72, h)]:
(i.e., MAP) estimates uf r can be ubtained by applying hWD formuh tu the
x@ + 1) = @,lxO) + y&W (22-43)
state-vuriable mudel
h(k) = hfx,(k) (22-44)
x(k + 1) = @x(k) + y&&k)+) (22-40)
and
z(k) = Wx(k) + v(k) (22-41)
xz(k + 1) = @x2(k) + y?S(k) (22-45)
where qM&k) is a MAP estimate of q (k). d(k) = h;x:(k) (22-46)
&~f. Example 14-2showed that a MAP estimate of q can be obtained State vectors x1 and x2 are nl x 1 and n2 x 1, respectively. Signal 6(k) is the
prior tu finding a MAP estimate of r. By using the product model for p(k), unit spike.
and hMAp,our state-variable model in (22-2) and (22-3) can be expressedas in In the stochasticsituation depicted in Figure 22-6, where h(k) is excited
(22-40) and (22-41).Applving
d (14-41) to this system,we see that by the white sequencew(k) and noise, v(k), corrupts s(k), the best we can
ki&j~ = ;MS@ IN) (22-42) possibly hope to achieveby waveshapingis to make z(k) = w (k)*h (k) + v(k)
look like w(k)*d(k) (Figure 22-7). This is becauseboth h(k) and d(k) must be
but, by comparing (22-40) and (22-2), and (22-41) and (22s3), we see that excited by the same random input, w fk), for the waveshapingproblem in this
&&#V) can be found from the MVD algorithm in Theorem 22-2 in which we situation to be well posed. The state-variable model for this situation is
replace p(k) by r(k) and set q (k) = u&&k). Cl
Recursive WaveshapingFilter
I t
0.24 -
0.12
~~~~
Time (set)
Figure 22-8 Details of recursivelinear waveshapingfilter. (Mendel, 1983a,@ 1983 Figure 22-N Noise-corrupted signal t(k). Signal-to-noise ratio chosen equal to
IEEE.) ten (Mendel, 1983a. 0 1983.IEEE).
I I I
0.00 0.20 0.40 0.60 0.80 1.o
Time (set)
Figure 22-11 k(k(N), (Mendet, 1983a. 0 1983, IEEE).
Figure 22-9 I3ernoullLGaussianinput sequence (Mendel, 1983a,C 1933, IEEE).
222 State Estimation: Smoothing Applications Lesson 22
Lesson 23
State Estimation
for the /t/o+So-Basic
State-Variable Model
0.00 0.20 0.40 0.60 0.80 1.0
Time (set)
Figure 22-12 &(kIN) when d,(t) = ea300r
(Mendel 1983a,0 1983, IEEE).
More smoothing is achieved when C(k IN) is convolved with a zero-phase wave-
form. 0
Finally, note that, becauseof Theorem 22-3,we know that perfect wave-
shaping is not possible. For example, the signal component of &(k/I?),
&(I@), is given by the expression INTRODUCTION
&(k Ih) = d (k)*R (k)*w (k) (22-60)
In deriving all of our state estimators we assumedthat our dynamical system
How much the auto-correlation function R(k) will distort &(kIN) from could be modeled as in Lesson 15, i.e., as our basic state-variable model. The
d (k)*w (k), depends,of course, on bandwidth and signal-to-noiseratio consid- results so obtained are applicable only for systemsthat satisfy all the condi-
erations. tions of that model: the noise processesw(k) and v(k) are both zero mean,
white, and mutually uncorrelated, no known bias functions appear in the state
or measurement equations, and no measurements are noise-free (i.e.,
PROBLEMS perfect). The following casesfrequently occur in practice:
22-l. Rederive the MVD algorithm for fi(kIIV), which is given in (22-5), from the
1. either nonzero-mean noise processesor known bias functions or both in
Fundamental Theorem of Estimation Theory, i.e., fi(klN) = E{&k)l%(IV)}.
the state or measurementequations
22-2. Prove Theorem 22-3. Explain why part (b) of the theorem means that fiS(k IN) is
a zero-phasewaveshapedversion of p(k). 2. correlated noise processes,
22-3. This problem is a memory refresher. You probably have either seen or carried 3. colored noise processes,and
out the calculations asked for in a course on random processes. 4. some perfect measurements.
(a) Derive Equation (22-34);
(b) Derive Equation (22-37). In this lesson we show how to modify some of our earlier results in order to
22-4. Prove the recursive waveshapingTheorem 22-7. treat these important special cases.In order to see the forest from the trees,
we consider each of these four casesseparately. In practice, some or all of
them may occur together.
223
224 State Estimation for the Not-So-Bask State-Variable Model Lesson 23 Correlated Noises
Here we asswrx that our basic state-variable model, given in (15-17) and Here we assumethat our basic state-variable model is given by (15-17) and
(H-18), has been modified IO (15-M), except that now w(k) and v(k) are correlated, i.e.,
x(k + 1) = @(k + l.k)x(k) + r(k + Lk)w](k) + P(k + l,k)u(k) (23-l) E{w(k)v(k)} = S(k) # 0 (23-12)
z(k + 1) = H(k + l)x(k +- 1) + G(k + l)u(k + I) + vI(k + I) (23-2) There are many approaches for treating correlated process and measurement
noises, some leading to a recursive predictor, some to a recursive filter, and
where WI(k) and y(k) are nonzero mean, individually and mutually un- others to a filter in predictor-corrector form, as in the following:
correlated Gaussiannoise sequences,i.e.,
J3w(k)l = ml(k) + 0 m,Jk) known (23-3) Theorem 23-2. When w(k) and v(k) are correlated, then u predictor-
correctur form uf the Kalman filter is
%(k)) = ml(k) + 0 rq(k) known (23-4)
i(k + Ilk) = @(k + l,k)i(klk) + F(k + l,k)u(k)
WW - ~&~lbd19 - ~wI~~~lI = QWO, Wb&J - mI~NW~ -
ml(j)ll = R(i)&,,andE-h(i) - ~wl(t)l[~~(~~
- mJ~911
= 0. + r(k + l,k)S(k)[H(k)P(klk - l)H(k) + R(k)]-i(k jk - 1) (2342)
This caseis handled by reducing (23-l) and (23-2) to our previous basic and
state-variable model, using the following simple transformations. Let ji(k + l(k + I> = g(k + Ilk) + K(k + l)i(k + Ilk) (2343)
w(k) A w#c) - mwI(k) (23-5) where Kalman gain matrix, K(k + I), is given by (17-12). filtering-error covar-
and iance matrix, P(k + Ilk + I>, is given by (17-l@, and, prediction-error covar-
v(k) A vdk) - ml(k) (23-6) iance matrix, P(k + l(k), is given by
Observe that both w(k) and v(k) are zero-mean white noise processes,with P(k + Ilk) = @(k + l,k)P(klk)@;(k + I,k) + Q,(k) (23-14)
covariances Q(k) and R(k), respectively. Adding and subtracting in which
W + 1&)a1 (k) in state equation (23-l) and mvI(k + 1) in measurement
@,(k + l,k) = @(k + 1,k) - r(k + l,k)S(k)R-(k)H(k) (23-15)
equation (23-2), these equations can be expressedas
and
x(k + 1) = Q(k + l,k)x(k) + r(k + 1 ,k)w(kJ + uI(k) (23-7)
Ql(k) = r(k + l,k)Q(k)I(k + l,k)
- T(k + l,k)S(k)R-(k)S(k)l-(k + l,k) (23-16)
q(k + 1) = H(k + I)x(k + I) + v(k + 1) (23-8)
Observe that, if S(k) = 0, then (23-12) reduces to the more familiar
predictor equation (16-4), and (23-14) reduces to the more familiar (17-13).
uI(k) = q(k + l,k)u(k) + lY(k + l,k)m,,Jk) (23-9)
P~oc$, The derivation of correction equation (23-13) is exactly the
same, when w(k) and v(k) are correlated, as it was when w(k) and v(k) were
xdk + 1) = z(k + 1) - G(k + l)u(k + I) - m,Jk + 1) (23-10) assumed uncorrelated. See the proof of part (a) of Theorem 17-1 for the
details.
Clearly, (23-7) and (23-8) is once again a basic state-variable model, one in In order to derive predictor equation (23-12), we begin with the Funda-
which uI(k) plays the role of g(k + l,k)u(k) and zI(k + 1) plays the role of - mental Theorem of Estimation Theory, i.e ,
z(k + 1).
r2(k + ilk) = E(x(k + 1)/55(k)) (23-17)
Theorem 23-I. When biases are present in a state-variable model, then Substitute state equation (15-17) into (23-171,to show that
that mude! can always be reduced tu u basic state-variable mode/ [e.g., (23-7) to
(23-N)]. All uf our previuus slate estimators can be applied to this basic state- ii(k + l(k) = @(k + l,k)i(k!k) + Y(k + I .k)u(k)
variable mode2 by replacing z(k) by q(k) and V(k + l,k)u(k) by u*(k). Cl + T(k + l.k)E{w(k)l%(k)) (23-N)
Colored Noises 227
226 State Estimation for the Not-So-Basic State-Variable Model Lesson 23
and
Next, we develop an expressionfor E{w(k)lZ(k)}.
Let Z(k) = co1(%(k - l), z(k)); then, P(k + lik) = [@(k + 1,k) - L(k)H(k)]P(klk - 1)
[QP(k+ 1,k) - L(k)H(k)]
E{w(k)l~(k)}= E(w(k)l~(k- l),z(k))
+ r(k + l,k)Q(k)I-(k + l,k)
= E{w(k)l%(k - l),i(kJk - 1)) (23-25)
- r(k + l,k)S(k)L(k)
= E{w(k)l%(k - 1)) + E{w(k)fi(kIk - 1)) (23-19)
- E(w(k)l - L(k)S(k)Y(k + 1,k)
= E{w(k)[i(kIk - 1)) + L(k)R(k)L(k)
In deriving (23-19) we used the facts that w(k) is zero mean, and w(k) and Proof. These results follow directly from Theorem 23-2; or, they can be
%(k - 1) are statistically independent. Because w(k) and i(klk - 1) are derived in an independent manner, as explained in Problem 23-2. Cl
jointly Gaussian,
Corollary 23-2. When w(k) and v(k) are correlated, then a recursive
E{w(k)li(klk - 1)) = P,;(k,klk - l)P;i(klk - l)g(klk - 1) (23-20)
filter for x(k + 1) is
where Pi; is given by (16-33), and
i(k + Ilk + 1) = (I?l(k + l,k)i(k\k) + *(k + l,k)u(k)
P,(k,kIk - 1) = E{w(k)Z(klk - 1)) + D(k)z(k) + K(k + l)Z(k + Ilk) (23-26)
= E{w(k)[H(k)%(klk - 1) + v(k)]} (23-21)
where
= S(k)
D(k) = I(k + l,k)S(k)R-(k) (23-27)
In deriving (23-21) we used the facts that i(klk - 1) and w(k) are statistically
independent, and w(k) is zero mean. Substituting (23-21) and (16-33) into and all other quantities have been defined above.
(23-20),we find that Proof. Again, these results follow directly from Theorem 23-2; how-
E{w(k)lZ(kIk - 1)) = S(k)[H(k)P(klk - l)H(k) + R(k)]- (23-22) ever, they can also be derived, in a much more elegant and independent
manner, as described in Problem 23-3. q
Substituting (23-22) into (23-19),and the resulting equation into (23-18)com-
pletes our derivation of the recursivepredictor equation (23-12).
We leave the derivation of (23-14) as an exercise. It is straightforward
COLORED NOISES
but algebraically tedious. 0
Quite often, some or all of the elements of either v(k) or w(k) or both are
Recall that the recursivepredictor playsthe predominant role in smooth- colored (i.e., have finite bandwidth). The following three-step procedure is
ing; hence, we present used in these cases:
Corollary 23-l. When w(k) and v(k) are correlated, then a recursive 1. model each colored noise by a low-order difference equation that is
predictor for x(k + l), is excited by white Gaussian noise;
2. augment the states associatedwith the step 1 colored noise models to the
i(k + ilk) = 4P(k + l,k)i(klk - 1)
original state-variable model;
+ V(k + l,k)u(k) + L(k)s(k[k - 1) (23-23)
3. apply the recursive filter or predictor to the augmented system.
where
L(k) = [@(k + l,k)P(klk - l)H(k) We try to model colored noiseprocessesby low-order Markov processes,
-1
i,e,) low-order difference equations. Usually, first- or second-order models
+ r(k + l,k)S(k)l[H(k)P(kjk - l)H(k) + R(k)]- (23-24)
Colored Noises 229
228 State Estimation for the Not-So-Basic State-Variable Model Lesson 23
BecauseH(k + 1) is a nonzero matrix, (23-44) implies that P(k + Ifk + 1) where L1 is n x I and L2 is n x (n - 1); thus,
must be a singular matrix. We leave it to the reader to show that once
x(k) = LlYW) + LP(k) (23-50)
P(k + Ilk + 1) becomessingular it remains singular for all other values of k.
In order to obtain a filtered estimate of x(k), we operate on both sidesof
(23-50)with E{ I%(k)}, where
l
In this model y is I x 1. What makes the design of a reduced-orderestimator = H@[Lly(k) + L*p(k)] + HTw(k) (23-54)
challenging is the fact that the I perfect measurementsare linearly related to = H@Lzp(k) + H@Lly(k) + HTw(k)
the n states, i.e., H is rectangular. At time k + 1 we know y(k); hence, we can reexpress (23-54) as
To begin we introduce a reduced-orderstate vector, p(k), whose dimen-
sion is (n - 2) x 1; p(k) is assumedto be a linear transformation of x(k), i.e., yl(k + 1) = H@L*p(k) + HIw(k) (23-55)
p(k) 4 Cx(k) (23-47)
Augmenting (23-47) to (23-46),we obtain yl(k + 1) 4 y(k + 1) - HQLy(k) (23-56)
Before proceeding any farther, we make some important observations
(23-48) about our state-variable model in (23-53) and (23-55). First, the new mea-
surement yl(k + 1) representsa weighted difference between measurements
H y(k + 1) and y(k). The technique for obtaining our reduced-order state-
Design matrix C is chosenso that c is invertible. Of course, many differ-
( > variable model is, therefore, sometimes referred to as a measurement-
ent choices of C are possible;thus, this first step of our reduced-order esti- diflerencing technique (e.g., Bryson and Johansen, 1965). Becausewe have
mator design procedure is nonunique. Let already used y(k) to reduce the dimension of x(k) from n to n - 1,we cannot
again use y(k) alone as the measurementsin our reduced-order state-variable
= &IL,) (23-49) model. As we have just seen,we must use both y(k) and y(k + 1).
Lesson 23 Problems
232 State Estimation for the Not-So-Basic State-Vkable Model Lesson 23
we see that
second, measurementequation (23-55) appears to be a combination of
signal and noise. Unless HI? = U, the term Hrw(k) will act asthe measurement &,(k + Ilk + 1) = &(k + ilk) (23-64)
noise in our reduced-order state-variable model. Its covariance matrix is
Equation (23-64) tells us to obtain a recursive filter for our reduced-order
HrQrE#. Unfortunately, it is possiblefor Hr to equal the zero matrix. From
model, that is in terms of data set 9,:(k + l), we must first obtain a recursive
linear system theory, we know that IV is the matrix of first IvIarkov param-
predictor for that model, which is in terms of data set B{(k). Then, wherever
eters for our originai systemin (23-45) and (23-46), and Hr may equal zero. If
t(k) appears in the recursive predictor, it can be replaced by yl(k + 1).
this occurs, then we must repeat all of the above until we obtain a reduced-
Using Corollary 23-1, applied to the reduced-order model in (23-53) and
order state vector whosemeasurementequation is excited by white noise. We
(23-58): we find that
see, therefore, that dependingupon systemdynamics, it is possibleto obtain a
reduced-order estimator of x(k) that uses a reduced-order Kalman filter of @,(k + l/k) = C@L&(@ - 1) + C@L,y(k)
dimension less than n - 1,
Third, the noises, which appear in state equation (23-53) and mea-
surement equation (23-S) are the same, namely w(k); hence, the reduced- thus,
order state-variable model involves the correlated noise casethat we described fi,,(k + l(k + 1) = C~L&,(k~k) + CQLly(k)
before in this chapter in the section entitled Correlated Noises.
+ L(~)[YI(~ + 1) - H@bfi,,(kh)l (23-66)
Finally, and most important, measurement equation (23-55) is non-
standard, in that it expressesyl at k + I in terms of p at k rather than p at Equation (23-66) is our reduced-order Kalman filter. It provides filtered esti-
k + 1. Recall that the measurementequation in our basicstate-variablemodel mates of p(k + 1) and is only of dimension (n - I) x 1. Of course, when L(k)
is z(k + 1) = Hx(k + 1) -t v(k + 1). We cannot immediately apply our Kal- and P,,(k + Ilk + 1) are computed using (23-13) and (23-14), respectively, we
man filter equations to (23-53) and (23-S) until we express (23-55) in the must make the folIowing substitutions: @(k + l,k)* C@L?, H(k)-+ H@Lz,
standard way. f(k + 1 ,k)-* CT, Q(k) --f Q, S(k) + QTH, and R(k)-+ HTQTH.
To proceed, we let
FINAL REMARK
so that
In order to see the forest from the trees, we have considered each of our
c(k) = H@L?p(k) + Hrw(k) (23-58)
special cases separately. In actual practice, some or all of them may occur
Measurement equation (23-58)is now in the standard form; however, because simultaneously. The exercises at the end of this lesson will permit the reader to
g(k) equals a future value of yI, namely yl(k + l), we must be very careful in gain experience with such cases.
applying our estimator formulas to our reduced-order model (23-53) and
(23-58).
In order to see this more clearly, we define the foIlowing two data sets, PROBLEMS
By@ + 1) = -h(l), y,(2), . . . , ydk + I), . . .I (23-59)
23-l. Derive the prediction-error covariance equation (23-14).
and 23-2. Derive the recursive predictor, given in (23-23), by expressing ri(k f l(k) as
(2340) E(x(k t- l)~~(k)) = E(x(k + l)jZE(k - l),i(kjk - 1)).
23-3. Here we derive the recursive filter, given in (23-26), by first adding a convenient
Obviously, form of zero to state equation (1517): in order to decorrelate the processnoise
(23-61) in this modified basic state-variable model from the measurement noise v(k).
Letting Add D(k)[z(k) - H(k)x(k) - v(k)] to-(15-17). The process noise, w,(k), in the
modified basic state-variable model, is equal to r(k f l,k)w(k) - D(k)v(k).
(23-62) Choose decorrelation matrix D(k) so that E{w,(k)v(k)] = 0. Then complete
and the derivation of (23-26). Observe that (23-14) can be obtained by inspection,
via this derivation.
(23-63)
Lesson 23 Problems 235
State Estimation for the Not-So-Basic State-Variable Model Lesson 23
23-11, Obtain the equations from which we can find &(k + Ilk + 1) i,(k + ilk + 1)
23-4. In solving Problem 23-3, one arrives at the following predictor equation, and c(k + l]k + 1) for the following system:
i(k + l(k) = QI(k + l,k)i(k(k) + P(k + l,k)u(k) + D(k)@)
Xl(k + 1) = --xl(k) + x*(k)
Beginning with this predictor equation, and corrector equation (Z-13) derive X2(k + 1) = x2(k) + w(k)
the recursive predictor given in (23-23). z(k + 1) = xI(k + 1) + v(k + 1)
23-5. Show that once P(k + l/k + 1) becomes singular it remains singular for all
other values of k. where v(k) is a colored noise process, i.e.,
23-6. Assume that R = 0, HF = 0, and HOPIf 0. Obtain the reduced-order v(k + 1) = - $ v(k) + n(k)
estimator and its associated reduced-order Kalman filter for this situation.
Contrast this situation with the case given in the text, for which NT f 0. Assume that w(k) and n (k) are white processesand are mutually uncorrelated,
and, c&(k) = 4 and $,(k) = 2. Include a block diagram of the interconnected
23-7. Develop a reduced-order estimator and its associatedreduced-order Kalman
system and reduced-order KF.
filter for the case when 2 measurements are perfect and m - I measurements
are noisy. 23-12. Consider the system x(k + 1) = @x(k) + yp(k) and z(k + 1) = hx(k + 1) +
v (k + 1), where p(k) is a colored noise sequenceand v(k) is zero-mean white
23-8. Consider the first-order system x(k + 1) = ix(k) + w*(k) and z (k + 1) = noise. What are the formulas for computing fi(kIk + l)? Filter &i(k Ik + 1) is a
x(k + 1) + v(k + 1), where E{wi(k)} = 3, E{v(k)} = 0, w,(k) and v(k) are deconvolution filter.
both white and Gaussian,E{w:(k)} = 10, E{v2(k)} = 2, and, w,(k) and v(k) are
correlated, i.e., E{wl(k)v (k)} = 1. 23-13. Consider the scalar moving average (MA) time-series model
(a) Obtain the steady-staterecursive Kalman filter for this system. z(k) = r(k) + r(k - 1)
(b) What is the steady-statefilter error variance, and how does it compare with
the steady-statepredictor error variance? where r(k) is a unit variance, white Gaussian sequence.Show that the optimal
one-step predictor for this model is [assume P(OI0) = I]
23-9. Consider the first-order system x(k + 1) = $x(k) + w(k) and z(k + 1) =
x (k + 1) + v(k + l), where w(k) is a first-order Markov process and v(k) is
i(k + l/k) = & [z(k) - i(klk - I)]
Gaussian white noise with E{v (k)} = 4 and r = 1.
(a) Let the model for w(k) be w(k + 1) = w(k) + u(k), where u(k) is a
zero-mean white Gaussian noise sequence for which E{u (k)} = d. (Hint: Express the MA model in state-spaceform.)
Additionally, E{w(k)} = 0. What value must cyhave if E(w(k)} = W for all 23-14. Consider the basic state variable model for the stationary time-invariant case.
k? assume also that w(k) and v(k) are correlated, i.e., E{w(k)v(k)} = S.
(b) Suppose W2 = 2 and 02, = 1. What are the Kalman filter equations for (a) Show, from first principles, that the single-stage smoother of x(k), i.e.,
estimation of x (k) and w(k)? %(k jk + 1) is given by
2340. Consider the first-order system x(k + 1) = - $x(k) + w(k) and t (k + 1) = i(klk + 1) = i(klk) + M(klk + l)i(k + Ilk)
x(k + 1) + v(k + l), where w(k) is white and Gaussian[w(k) - N(w(k); 0,
1)] and v (k) is also a noise process. The model for v(k) is summarized in Figure where M(kIk + 1) is an appropriate smoother gain matrix.
P23-10. (b) Derive a closed form solution for M(kIk + 1) as a function of the
(a) Verify that a correct state-variable model for v(k) is, correlation matrix S and other quantities of the basic state-variable model.
Figure P23-10
A Dynamical Model 237
fesson 24
f inecwizafion
und Discretization
Figure 24-l Coordinate system for an angular measurement between two objects
of Nonlinear Systems A and 3.
A DYNAMICAL MODEL
INTRODUCTION
The starting point for this lesson is the nonlinear state-variable model
Many real-world systemsare continuous-time in nature and quite a few are g(f) = f[x(&u(t),t] + G(t)w(l) (24-l)
also nonlinear. For example, the state equationsassociatedwith the motion of
a satellite of massnz about a spherical planet of massA!, in a planet-centered z(f) = hb(+.+),d + ~0) (24-Z)
coordinate system, are nonlinear, becausethe planets force field obeys an We shall assumethat measurements are only available at specific values of
inverse square law. Figure 24-l depicts a situation where the measurement time, namely at t = ti, i = 1, 2, . . . ; thus, our measurement equation will be
equation is nonlinear. The measurement is angle 4, and is expressed in a treated as a discrete-time equation, whereasour state equation will be treated
rectangular coordinate system, i.e., +i = tan- [y /(x - ii)], Sometimes the asa continuous-time equation. State vector x(i) is n X 1; u(t) is an 1 X I vector
state equation may be nonlinear and the measurement equation linear, or of known inputs; measurement vector z(t) is m x 1; ii(f) is short for dx(t)/dt;
vice-versa, or they may both be nonlinear. Occasionally, the coordinate sys- nonlinear functions f and h may depend both implicitly and explicitly on l, and
tern in which one chooses to work causesthe two former situations. For we assumethat both f and h are continuous and continuously differentiable
example, equations of motion in a polar coordinate system are nonlinear, with respect to all the elements of x and u; w(t) is a continuous-time white
whereasthe measurementequations are linear. In a polar coordinate system, noise process,i.e., E{w(t)) =. 0 and
where # is a state-variable, the measurementequation for the situation de-
picted in Figure 24-1 is zj = &, which is linear. In a rectangular coordinate E(w(~)w(T)) = Q(r)?+ - 7); (24-3)
system, on the other hand, equations of motion are linear, but the mea-
surement equations are nonlinear. v(ti) is a discrete-time white noise process,i.e., E(v(t,)) = 0 for t = E,, i = 1,
Finally, we may begin with a linear systemthat contains some unknown and
parameters. When these parameters are modeled as first-order Markov E(v(~~)v(~j))= R(ti)6, ; (24-4)
processes?and these models are augmentedto the original system, the aug-
mented model is nonlinear, because the parameters that appeared in the and, w(f) and V(Q)are mutually uncorrelated at all E = t,, i.e.:
original linear model are treated as states.We shall describethis situation in
much more detail in Lesson25. E(w(r)v(~J} = 0 fort = t, i = 1,2,. . . (24-5)
236
238 Linearization and Discretization of Nonlinear Systems Lesson 24 Linear Perturbation Equations
Example 24 1 Comparing (24-8) and (24-l), and (24-9) and (24-2), we conclude that
Here we expand upon the previously mentioned satellite-planet example. Our example 1 2.w4
is taken from Meditch (1969, pp. 60-61), who states. . . Assuming that the planets
force field obeys an inverse square law, and that the only other forces present are the
fb(t),u(t>Jl= co1x2,x1x:- Xl$+ mU1=X4
-- Xl +lmu2 1 (24-10)
satellites two thrust forces u,(t) and u@(t)(see Figure 24-2), and that the satellites and
initial position and velocity vectors lie in the plane, we know from elementary particle
mechanicsthat the satellites motion is confined to the plane and is governed by the two h[x(t),u(t),t]= x1 - ro (24-11)
equations
Observe that, in this example, only the state equation is nonlinear. 0
.. 1
r = r(j2 - - y+
; u40 (24-6)
r2
_ li = 2G + 1
; ue(0 (24-7) In this section we shall linearize our nonlinear dynamical model in (24-l) and
r
(24-2) about nominal values of x(t) and u(t), x*(t) and u*(t), respectively. If we
where y = GM and G is the universal gravitational constant. are given a nominal input, u*(t), then x*(t) satisfies the following nonlinear
Definingxl=r,xz= i,x~=9,xq=8,ul=ur,anduz=u,,wehave differential equation,
g*(t) = f [x*(t),u*(t),t] (24-12)
(24-8) and associatedwith x*(t) and u*(t) is the following nominal measurement,
z*(t), where
Z*(t) = h [x*(t),u*(t)] t = ti i = 1, 2, . . . (24-13)
which is of the form in (24-l). . . . Assuming . . . that the measurement made on the
satellite during its motion is simply its distance from the surface of the planet, we have Throughout this lesson, we shall assumethat x*(t) exists. We discusstwo
the scalar measurement equation methodsfor choosingx*(t) in Lesson25. Obviously, one is just to solve (24-12)
for x*(t).
z(t) = r(t) - r() + v(t) = x1(t) - ro + v(t) (24-9) Note that x*(t) must provide a good approximation to the actual behav-
where r. is the planets radius. ior of the system. The approximation is considered good if the difference
between the nominal and actual solutions can be described by a system of
linear differential equations, called linear perturbation equations. We derive
these equations next.
Let
6x(t) = x(t) - x*(t) (24-14)
Figure 24-2 Schematic for satellite-planet system (Copyright 1969,McGraw-Hill). + wb@) - f Lx*(t),u*(WI (24-16)
240 Linearization and Discretization of Nonlinear Systems Lesson 24 Linear Perturbation Equations 241
Fact 1. When f [x(&u(j),f] is expandedin a Taylor seriesabout x*(t) and Observe that, even if our original nonlinear differential equation is not an
u*(t), we obtain explicit function of time (i.e., f [x(r>.u(t),f] = f [x(f),u(t)]), our perturbation
state equation is alwaystime-varying becauseJacobian matrices F, and F, vary
f[x(t)u(t),t] = f[x*(t),u*($f] + ~[x*(~),u*(~),~]~x(t) with time, becausex and u* vary with time.
+ FU[x*(t),u*(&@u(t) + higher-order terms (24-17) Next, let
where FXaud FUare n x n and n x f Jacobian matrices, i.e., Sz(r) = z(t) - z*(t) (24-24)
wherei = 1,2,..., n. Collecting these II equations together in vector-matrix We leave the derivation of this fact to the reader, becauseit is analogous
format, we obtain (24-17), in which FXand FUare defined in (24-18) and to the derivation of the Taylor seriesexpansion off [x(f),u(t),/].
(24-19), respectively. El Substituting (24-25) into (24-24) and neglecting the higher-order
terms , we obtain the following perturbation measurement equation
Substituting (24-17) into (24-X) and neglecting the higher-order
terms, we obtain the following perturbation sttiie equatim 6z(r) = H, [x*(t>,u*(r),r]Sx(r)
+ H~[X*(t),U*(t)yt]SU(t) + V(f) t = Ii, i = 2,2,... (24-30)
G(t) = FX[x*(f),u*(t),#x(t)
+ FU[x*(r},u*(t)~]&~(t) + G(+v(f) (24-23) Equations (24-23) and (24-30) constitute our linear perturbation equa-
1 tions, or our linear perturbation stare-variable model.
242 Linearization and Discretization of Nonlinear Systems Lesson 24 Discretization of a Linear Time-Varying State-Variable Model
0 0
x(t)= @(t,to)x(to)
+ ~@(w)[W)u(r)+ G(++)ld~ (24-36)
; 0 t0
F,[x*(t),u*(t),t] = FU= where state transition matrix @(t,T) is the solution to the following matrix
0 0
homogeneous differential equation,
0 -l
m
&(t,r) = F(t)<P(t,T) (24-37)
Hx[x*(t),u*(t),t] = H, = (1 0 0 0) @(t,t) = I
and
This result should be a familiar one to the readers of this book; hence, .
H, [x*(t),u*(t),t] = 0 we omit its proof.
In the equation for FX[x*(t)], the notation ( )* means that all xi(t) within the matrix Next, we assume that u(t) is a piecewise constant function of time for
are nominal values, i.e., xi(t) = xi*(t). t E [tk ,tk+ J, and set to = t&and t = t&+1in (24-36), to obtain
Observe that the linearized satellite-planet system is time-varying, because its
+ 1
linearized plant matrix, F,[x*(t)], dependsupon the nominal trajectory x*(t). 0 @(tk + l,$(+r u(fk)
x(tk + 1) = @(tk + l,fk)X(tk) +
I
I
rk + 1
+ @(tk + It7 (24-38)
DISCRETlZATlON OF A LINEAR TIME-VARYING k
STATE-VARIABLE MODEL
can also be written as
In this section we describehow one discretizesthe generallinear, time-varying x(k + 1) = @(k + 1, k)x(k) + P(k + l,k)u(k) + Wdck) (24-39)
state-variable model
k(t) = F(t)x(t) + C(r)u(t) + G(t)w(t) (24-31)
@(k + 1,k) = @(tk + dk) (24-40)
Z(t) = H(t)x(t) + v(t) t = ti, i = 1729 . . l (24-32)
I
tk + 1
The application of this sections results to the perturbation state-variable P(k + 1,k) =
@(tk + (24-41)
tk
model is given in the following section.
In (24-31) and (24-32),x(t) is n X 1, control input u(t) is I X 1, process and wd(k) is a discrete-time white Gaussiansequencethat is statistically
noise w(t) is p x 1, and z(t) and v(t) are each wt x 1. Additionally, w(t) is a alent to
continuous-time white noise process,V(ti> is a discrete-time white noise pro-
I
tk + 1
cess, and, w(t) and v(ti) are mutually uncorrelated at all t = ti, i = 1, 2, . . . , @(tk + 1J )G(T)w(7)dT
E{w(t)w(T)} = Q(t)s(t - T) (24-33) The mean and covariance matrices of wd(k) are
lk + 1
z
[I + F&k+ t - ;>Jc,d~
I fk (24-50)
tk + 1
respectively.
Ubserve, from the right-hand side of Equations (24-40), (24-41), and
(24-43), that these quantities can be computed from knowIedge about F(l), where we have truncated *(k + 1,k) to its first-order term in T, Proceeding in a similar
C(f), G(t), and Q(l). In general,we must compute@@ + l,kj, q(k + l,k), and manner for Qd(k), it is straightforward to show that
Qd(k) using numerical integration, and, these matrices change from one time
interval to the next becauseF(l), C(t), G(l), and Q(r) usually changefrom one Q&j = GkQ&iET (24-51)
time interval to the next. Note that (24-47), (24~49), (24-50), and (24-51), while much simpIer than their
Becauseour measurementshave been assumedto be available only at original expressions, can change in values from one time-interval to another, because
sampled values of t, namely at t = fi, i = 1, 2, . . . , we can express (24-32) as of their dependenceupon k. Cl
and, for sufficiently small values of T I(k + l,k;*) = 1 +I @(tk+ I,f;*)FU[x*(r),u*(r),~]dT (24-56)
II,
e FkT= I -I-FkT (24-49)
We use this approximation for eFkTin deriving simpler expressionsfor q(k + 1,k) and
Qd(k). Comparable results can be obtained for higher-order truncations of eFkT . Qd(k;*) = i + @(t, + I,T;*)G(~)Q(~)G(r)W(t~ + l,q*)dr (24-57)
A
Linearization and Discretization of Nonlinear Systems Lesson 24 Lesson 24 Problems
and
Kalman Filter
(25-5)
fterated Least Squcwes 2. Concatenate (252) and compute [or~fLP)lusing our Lesson
3 formulas.
We observe, from this four-step procedure, that ILS uses the estimate
obtained from the linearized model to generate the nominal value of 6 about
This Iessonis primarily devoted to the extended Kalman fiber (EKF), which is which the nonlinear model is relinearized. Additionally, in each complete
a form of the Kalman filter extended to nonlinear dynamical systemsof the cycle of this procedure, we use both the nonlinear and linearized models. The
type described in Lesson 24. We shall show that the EKF is related to the nonlinear model is used to compute z *(AC),and subsequently 6~(k), using
method of iterated least squares (ILS), the major difference being that the (25-3).
EKF is for dynamical systemswhereas ILS is not. The notions of relinearizing about a filter output and using both the
nonlinear and linearized models are also at the very heart of the EKF.
ITERATED LEAST SQUARES
We shall illustrate the method of ILS for the nonlinear model described in EXTENDED KALMAN FILTER
Example 2-5 of Lesson2, i.e., for the model
The nonlinear dynamical system of interest to us is the one described in
z(k) =f (W + +) (25-l) Lesson 24. For convenience to the reader, we summarize aspects of that
wherek = 1,2 ,..., N. system next. The nonlinear state-variable model is
Iterated Ieastsquaresis basically a four step procedure. i(t) = f [x(t),u(t),t] + G(t)w(t) (25-9)
1. Linearizef(&k) about a nominal value of O,O*. Doing this, we obtain ~(0 = h ~xO),u(O,d + v(t) t = tj i = 1,2 ,--a (25-10)
the perturbutim meusuremerzltyuutkm
Given a nominal input, u*(l), and assuming that a nominal trajectory, x*(t),
Sz(k) = Fo(k;O*)cW + v(k) k = 1,2,. . . , N (25-2) exists, x*(f) and its associated nominal measurement satisfy the following
where nominal system model,
Sz(k) = z(k) - z*(k) = z(k) -j@*.k) (25-3) i*(t} = f [x*(r),u*(t).t] (25-l 1)
M= e - e* (25-4) z*(t) = h [x*(r).u*(t).t] t = t, i = I,2 ) . . * (25-12)
24a
Iterated Least Squares and Extended Kalman Filtering Lesson 25 Extended Kalman Filter 251
Letting &x(t) = x(l) - x*(t), h(t) = u(t) - u*(t), and 6z(t) = z(t) - z*(t), we how to choose x*(l) for the entire interval of time t E [fk ,tk + J. Thus far, we
also have the following discretized perturbation state-variable model that is have only mentioned how x*(t) is chosenat tk, i.e., as r2(kIk).
associatedwith a linearized version of the original nonlinear state-variable
model, Theorem 251. As a consequenceof relinearizing about i(klk) (k = 0,
Sx(k + 1) = @(k + l,k;*)Sx(k) + V(k + l,k;*)Gu(k) + W&) (25-13) 1 ? l l >Y
Sz(k + 1) = H,(k + 1;*)6x(k + 1) si(tlt,) = 0 for all t E [tk ,tk+ J (25-15)
+ H,(k + l;*)Su(k + 1) + v(k + 1) (25-14) This meansthat
In deriving (25-13) and (2514), we made the important assumptionthat x*(t) = %(tlfk)for ai/ t E [tk ?fk+ 11 (2516)
higher-order terms in the Taylor series expansions of f[x(t),u(t)?t] and
h [x(t),u(t),t] could be neglected.Of course,this is only correct aslong asx(t) is Before proving this important result, we observe that it provides us with
close to x*(t) and u(t) is close to u*(t). a choice of x*(t) over the entire interval of time t E [tk ,tk + J, and, it statesthat
If u(t) is an input derived from a feedback control law, so that at the left-hand side of this time interval x*(&) = i(kIk), whereas at the
u(t) = u[x(t),t], then u(t) can differ from u*(t), becausex(t) will differ from right-hand side of this time interval x*(tk +1) = f(k + Ilk). The transition from
x*(t). On the other hand, if u(t) doesnot dependon x(t) then usually u(t) is the jZ(k + Ilk) to i(k + ilk + 1) will be made using the EKFs correction equa-
same as u*(t), in which case Su(t) = 0. We see, therefore, that x*(t) is the tion.
critical quantity in the calculation of our discretized perturbation state-
variable model. Proof. Let tl be an arbitrary value oft lying in the interval betweenfkand
Supposex*(t) is given a priori; then we can compute predicted, filtered, tk+ 1(seeFigure 25-l). For the purposesof this derivation, we can assumethat
or smoothed estimates of 6x(k) by applying all of our previously derived h(k) = 0 [i.e., perturbation input &u(k) takeson no new values in the interval
estimators to the discretized perturbation state-variablemodel in (25-13) and from tkto tk+ 1; recall the piecewise-constantassumption made about u(t) in
(25-14).We can precompute x*(t) by solving the nominal differential equation the derivation of (24-37)], i.e.,
(25-11).The Kalman filter associatedwith using a precomputed x*(t) is known 6x(k + 1) = <P(k + l,k;*)ax(k) + wd(k) (25-17)
as a relinearited KF.
A relinearized KF usually gives poor results, becauseit relies on an Using our general state-predictor results given in (16-14), we seethat (remem-
open-loop strategy for choosingx*(t). When x*(t) is precomputed there is no ber that k is short for tk , and that tk+1 - fk doesnot have to be a constant; this
way of forcing x*(t) to remain close to x(t), and this must be done or else the is true in all of our predictor, filter, and smoother formulas)
perturbation state-variable model is invalid. Divergence of the relinearized
KF often occurs; hence, we do not recommendthe relinearized KF. s;i(t, Itk) = @(t, ,tk ;*)sg(t, I&)
The relinearized KF is basedonly on the discretized perturbation state- = @(t,,tk;*)[k(kIk) - x*(k)] (2518)
variable model. It does not use the nonlinear nature of the original systemin
an active manner. The extended Kalman filter relinearizes the nonlinear sys-
tem about eachnew estimate asit becomesavailable; i.e., at k = 0, the system
is linearized about x(010).Once z(1) is processedby the EKF, so that %(111)is
obtained, the systemis linearized about x( 111).By linearize about x( ill), we
mean x(111)is used to calculate all the quantities needed to make the transi-
tion from x( 111)to i(2/1), and subsequentlyx(212).This phrase will become
clear below. The purpose of relinearizing about the filters output is to use a
better reference trajectory for x*(t). Doing this, 6x = x - i will be held as
small as possible, so that our linearization assumptionsare less likely to be
violated than in the caseof the relinearized KF.
The EKF is developed below in predictor-corrector format (Jazwinski,
1970). Its prediction equation is obtained bv- integratingc the nominal differ- tk t k+l
ential equation for x*(t), from tk to tk +1. In order to do this, we need to know Figure 25-l Nominal state trajectory x* (t).
252 Iterated Least Squares and Extended Kalman Filtering lesson 25
Extended Kalman Filter
One of the earliest applications of the extended Kalman filter was to param-
eter estimation (Kopp and Orford, 1963).Consider the continuous-time linear
fi(k+ Ilk+ I)
system
a
tktl
Figure 252 Iterated EKF. All of the calculations provide us with a refined estimate of
Matrices A and H contain some unknown parameters, and our objective is to
estimate these parameters from the measurementsZ(ti> as they become avail-
able.
To begin, we assume differential equation models for the unknown
parameters, i.e., either
#L
corrector
In the latter models n/(t) and qj(t) are white noise processes,and one often
choosescl = 0 and dj = 0. The noises n,(t) and qj(t) introduce uncertainty
about the constancy of the al and hj parameters.
g,(k
#I
and (25-lo), which means we have reduced the problem of parameter esti-
mation in a linear system to state estimation in a nonlinear system.
+
converge to their true values. He shows that another term must be added to
the EKF corrector equation in order to guarantee convergence. For details,
seehis paper.
W+)
4
k
Wlk)
Example 25-l
Considerthe satellite and planet Example24-1, in which the satellitesmotion is
governedby the two equati.ons
..
r=$-- 1
*y+ m uro (25-34)
Iterated Least Squares and Extended Kalman Filtering Lesson 25 Lesson 25 Pfoblems
and Noise u (t, > is white and Gaussian, and &(f,) is given. The control signal u(t) is
the sum of a desired control signal u(r) and additive noise, i.e.,
(2535)
We shall assumethat m and y are unknown constants, and shall model them as The additive noise SU(f) is a normally distributed random variable modulated bY
a function of the desired control signal. i.e.,
h+(r) = 0 (25-36)
j(r) = 0 (2537)
su(f) = S[u*(r)Jwo(r)
where we(t) is zero-mean white noise with intensity &. Parameters
and as(t) may be unknown and are modeled as
h(r) = ar(r)[a,(t)- i%(t)]+ w,(t) i = 1, 2,3
In this model the parameters Q,(t) are assumedgiven, as are the a priori values of
a,(t) and e,(l), and, r+pi(t)are zero-mean white noises with intensities cr$,.
(2533)
(a) What are the EKF formulas for estimation of x1, x2, al, and ~2,assumingthat
us is known?
(b) Repeat (a) but now assumethat u3is unknown.
25-4. Suppose we begin with the nonlinear discrete-time system,
NOW, x(k + 1) = f[x(k),k] + w(k)
z(k) = h[x(k),k] + v(k) k = 1,2,...
Develop the EKF for this system [Hint: expand f [x(k),k] and h [x(k),k] in Taylor
series about ri(k jk) and i(klk - l), respectively].
We note, finally, that the modeling and augmentation approach to 25-S. Refer to Problem 24-7. Obtain the EKF for
parameter estimation?described above, is not restricted to continuous-time (a) Equation for the unsteady operation of a synchronousmotor, in which C and
linear systems.Additiona situations are describedin the exercises. p are unknown;
(b) Duffings equation, in which C, cy, and p are unknown;
(c) Van der Pals equation, in which Eis unknown; and
PROBLEMS (d) Hills equation, in which a and b are unknown.
where xl = i(f), which is the actual pitch rate. Sampledmeasurements are made
of the pitch rate, i.e.,
z(r) = x,(r) + v(r) 1 = f, i = 1,2,...,N
26
A Log-Likelihood Function for the Basic State-Variable Model 259
identifiable.
Estimation Before we can determine 4MLwe must establishthe log-likelihood func-
tion for our basic state-variable model.
INTRODUCTION
The measurementsare all correlated due to the presenceof either the process
In Lesson 11 we studied the problem of obtaining maximum-likelihood esti- noise, w(k), or random initial conditions or both. This represents the major
mates of a collection of parameters, 0 = co1(elementsof Qp,T, H, and R), that difference between our basic state-variable model, (26-5) and (26-6), and the
appear in the state-variablemodel state-variable model studied earlier, in (26-l) and (26-2). Fortunately, the
measurementsand innovations are causally invertible, and the innovations are
x(k + 1) = @x(k) + Vu(k) (26-l) all uncorrelated, so that it is still relatively easyto determine the log-likelihood
z(k + 1) = Hx(k + 1) + v(k + 1) k =O,l,...,N- 1 (26-2) function for the basic state-variable model.
We determined the log-likelihood function to be Theorem 264. The log-likelihood function for our basic state-variable
model in (26-5) and (26-6) is
L(8pfJ = - i 5 [z(i) - HBxg(i)]Ril [z(i) - Hex0(i)] - $ In lRel (26-3) L(813) = - i 5 [Zi (jlj - l)JV,l (jlj - l)&(j(j - 1)
i =1
j=l
where quantities that are subscripted 8 denote a dependenceon 8. Finally, we + WWli - 1)ll (26-9)
pointed out that the state equation (26-l), written as where &,( j(j - 1) is the innovations process, and & ( jl j - 1) is the covariance
xe(k + 1) = Qexe(k) + I!,u(k) xe(0) known (26-4) of that process [in Lesson 16 we used the symbol Pti (jlj - 1) for this covar-
iance],
acts as a constraint that is associated with the computation of the log- &(jlj - 1) = HftPe( jlj - l)Hi + RB (26-10)
likelihood function. Parameter vector 0 must be determined by maximizing
L(f@!l) subject to the constraint (26-4). This can only be done using mathe- This theorem is also applicable to either time-varying or nonstationary
matical programming techniques (i.e., an optimization algorithm such as systemsor both. Within the structure of these more complicated systemsthere
steepest descentor Marquardt-Levenberg). must still be a collection of unknown but constant parameters. It is these
In this lessonwe study the problem of obtaining maximum-likelihood parameters that are estimated by maximizing L (61%).
Maximum-Likelihood State and Parameter Estimation Lesson 26
On Computing &, 261
Pro@ (Mended, 1983b, pp. lOl-1031. We must first obtain the joint
In the present situation, where the true values of 9, 8~~are not known
density function p (S!@)= p (z(l), . . . , z(N)~9).In Lesson 17 we saw that the
but are being estimated, the estimate of x(i) obtained from a Kalman filter
innovations process i(+ - 1) and measurement z(i) are causaIiy invertible;
will be suboptimal due to wrong values of 8 being used by that filter. In fact,
thus, the density function
we must use iML in the implementation of the Kalman filter, becausehMLwill
jqi(liO), qq), . . . , qqN - l#) be the best information available about OTat tj. If bML-+OTas Iv-+ x, then
contains the same data informatiop as-p (z(l), . . . , z(N@) does. Conse- &,,Mi - I>+ G,(jIj - 1) as iV--+ x, and the suboptimal KaIman filter will
approach the optimal Ratman filter. This result is about the best that one can
quently, L (91%j can be replacedby I,@/?E),where
hope for in maximum-likelihood estimation of parameters in our basic state-
2 = co1(i( liO), . * . , i(Npv - 1)) (26-l 1) variable model.
and Note also, that although we beganwith a parameter estimation problem,
we wound up with a simultaneous state and parameter estimation problem.
, i(AqN - 1)/e) (2642)
This is due to the uncertainties present in our state equation, which necessi-
L@@) = inJqql~o), . l l
Now, however, we use the fact that the innovations processis Gaussian tated state estimation using a Kalman filter.
white noise to expressA$!$$) as
ON COMPUTING ii,,
For our basic state-variablemodel, the innovations are Gaussiandistributed, How do we determine 6MLfor L (@X>given in (26-9) (subject to the constraint
which means that pj (i(jlj - 1)19)= ~?(i(jij - 1)10)forj = 1, . . . , W, hence, of the Kalman filter)? No simple closed-form solution is possible, because8
entersinto L (elfq in a comphcatednonlinear manner. The only way presently
L(@Q = ln fi p(i(jij - 1)/e) (26-14) known to obtain 8MLis by meansof mathematical programming.
j=l
The most effective optimization methods to determine &$Lrequire the
From part (b) of Theorem 16-2in Lesson 16 we know that computation of the gradient of L(C@E)as well as the Hessian matrix, or a
pseudo-Hessianmatrix of ,5(01%).The Marquardt-Levenberg algorithm (also
known as the Levenberg-Marquardt algorithm [Bard, 1970; Marquardt,
1963]),for example, has the form
*i+ 1 = A.
Substitute (26-H) into (26-14)to show that 8 ML &vi, - (6 + Q)-lg, i = 0, 1, , . . (26-17)
where &-denotes the gradient
L(0pt) = -; $ [Z(jij - 1)N-1 (j[j - l)Z(jlj - 1)
j-1
We direct our attention to the calculationsof gi and Hi. The gradient of are used by the sensitivity equations; hence, the Kalman filter must be run
L(O~%) will require the calculationsof together with the d setsof sensitivity equations. This procedure for recursively
calculating the gradient dL (Ofa)l d0 therefore requires about as much com-
putation asd + 1 Kalman filters. The sensitivity systemsare totally uncoupled
and lend themselves quite naturally to parallel processing (see Figure 26-l).
The Hessian matrix of L (01%)is quite complicated, involving not only
The innovations depend upon i, (jlj - 1); hence, in order to compute first derivatives of Ze(j 1j - 1) and N,( j 1j - l), but also their second deriva-
Wjlj - l)/a9, we must compute a&( jlj - 1)/J& A Kalmanfilter must be tives. The pseudo-Hessianmatrix of L (Ol%) i gnores all the second derivative
usedto compute f, (jlj - 1); but this filter requires the following sequenceof terms; hence, it is relatively easy to compute because all the first derivative
calculations: Pg(klk)+PB(k + llk)+K,(k + l)+&(k + llk)+&(k + 11 terms have already been computed in order to calculate the gradient of
k + 1). Taking the partial derivative of the prediction equation with respectto L(Ol%). Justification for neglecting the second derivative terms is given by
6iwe find that Gupta and Mehra (1974), who show that as iML approaches &-, the expected
value of the dropped terms goesto zero.
The estimation literature is filled with many applications of maximum-
likelihood state and parameter estimation. For example, Mendel (1983b) ap-
mi
+xu(k) i = 1,2,...,d (26-20) plies it to seismic data processing,Mehra and Tyler (1973) apply it to aircraft
i
parameter identification and McLaughlin (1980) applies it to groundwater
We seethat to compute &(k + Ilk) / a6i, we must also compute &(k Ik) / aei. flow.
Taking the partial derivative of the correction equation with respect to 6i, we
find that
d&(k + Ilk + 1) = &,(k + ilk) + dK*(k + 1)
[z(k + 1) - H&k + Ilk)]
89i 86i 30i
-&(k + 1) [$+(k + Ilk)
i 113111v rry
1-q dLld6,
her a
+H h(k + ilk)
8 tMi 1 i = 1,2, . . . , d (26-21)
A STEADY-STATE APPROX!MATlON
and
Supposeour basicstate variabIe mode1is time-invariant and stationary so that i+(k f l/k) = z(k + 1) - H,&,(k + ilk) (26-32)
F = ht$ P(j\j- - 1) exists. Let
Once we have computed +ML we can compute &IL by inverting the trans-
x=E@H+R (26-22) formations in (26-27). Of course, when we do this, we are also using the
and invariance property of maximum-likelihood estimates.
Observe that z(+j%) in (26-29) and the filter in (2630)-(26-32) do not
depend on r; hence, we haye not included r in any definition of $. We explain
how to reconstruct r from $ MLfollowing Equation (26-44).
Because maximum-likelihood estimates are asymptoticahy efficient
Log-likelihood function ~(9~5!!)is a steady-stateapproximation of I. (C$iC).The
steady-stateKalman filter used to compute i(j[j - 1) and x is (Lesson ll), once we have determined ?*iML,the filter in (26-30) and (26-31)
will be the steady-stateKalman filter.
i(k + l/k) = cD%(k\k)-I-ml(k) (X-24) The major advantageof this steady-stateapproximation is that the filter
sensitivity equations are greatly simplified. When K and x are treated as
i(k + Ilk + I) = i(k + lik) + K[z(k + 1)
matrices of unknown parameters we do not need the predicted and corrected
- Hi(k + l\k)] (26-25) error-covariance matrices to compute K and x. The sensitivity equations
in which E is the steady-stateKalman gain matrix. for (26-32), (26-30), and (26-31) are
Recall that
Wk + w = _ dG(k + Ilk) - % %+(k + ilk) (26-33)
0 = co1(elements of Q, IY,q, H, Q, and R) (26-26) d#i IG w 1
We now make the following transformations of variables: d%(k + Ilk) = Qr &&Ik) a@, * J%
+- + 3 Gw + x 44 (26-34)
d#i Wi I 1
where + isp x I, and view E as a function of 4, i-e,, where i = 1, 2, . . . , p. Note that d&,/d#i is zero for all & not in q and is a
matrix filled with zeros and a single unity value for & in 5.
There are more elements in + than in 9, because.K and K have more
m@l = - ; g i;( j/j - 1)x&( j[j - I) - ; IV In ix+1 (26-29)
unknown elementsin them than do Q and R, i.e., p > d. Additionally, x does
j-l
not appear in the filter equations; it only appears
1 in t(t$l~). It is, therefore,
Instead of finding 6MLthat maximizes ~(O~~], subject to the constraints of a
fulI-blown Kalman filter, we now propose to find &L that maximizes ~(c#X), possible to obtain a closed-form solution for xML,
subject ?othe cunstraints of the foIlowing filter:
Theorem 26-Z. A closed-form solution for matrix gMLis
&,(k + lik) = @&,(k/k) + P&k) (26-30)
&(k + lik + 1) = &(k + Ilk) + ii&&k + lik) (26-31) (26-36)
266 Maximum-Likelihood State and Parameter Estimation Lesson 26 A Steady-State Approximation
Proof. To determine %hlLwe must set dz:(+[%)/%& = 0 and solve the These equations must be solved simultaneously for p and @Q&L using
resulting equation for I&,. This is most easily accomplished by applying iterative numerical techniques. For details, seeMehra (1970a).
gradient matrix formulas to (26-29) that are given in Schweppe(1974).Doing Note finally, that the best for which we can hope by this approachis not
this, we obtain I?& and QML,but only= ML.This is due to the fact that, when I and Q are
both unknown, there will be an ambieuitv in their determination, i.e., the
(26-37) term rw(k) which appears in our basic-state-variable model [for which
EbWw(~)~ = Ql cannot be distinguishedfrom the term w@), for which
whosesolution is gM, in (26-36). 0 E{w,(k)w;(k)} = Q1 = IQI (26-47)
Observe that & is the samplesteady-state-covariancematrix of &; i.e., This observation is also applicable to the original problem formulation
wherein we obtained & directly; i.e.? when both I and Q are unknown, we
A should really choose
cb ML'& ZML+ j-32
lim cov [Z(j lj - l)]
A A 8 = co1(elementsof @, V, H, rQr, and R) (26-48)
Supposewe are also interested
a in determining QMLand RML.How do we
obtain these quantities from $ML? In summary, when our basic state-variable model is time-invariant and
As in Lesson 19, we let K, i, and PI denote the steady-statevalues of stationary, we can first obtain 4 MLby maximizing Z($la) given in (26-29),
K(k + l), P(k + Ilk), and P(k lk), respectively, where subject to the constraints of the simple filter in (26-30), (26-31), and (26-32).
A mathematical programming method must be used to obtain those elements
K = PH(H~@ + R)-1 = jQfp-1 (26-38) of +MLassociated with &ML, $i&, tiMLYand &. The closed-form solution,
p = @i!@ + IQI (26-39) given in (26-36), is used for x ML.Finally, if we want to reconstruct RMLand
and (IQI?)ML, we use (26-44) for the former and must solve (26-45) and (26-46)for
the Iatter.
i!, = (I - KH)i! (26-40)
Example 26-1 (Mehra, 1971)
Additionally, we know that
The following fourth-order system, which representsthe short period dynamics and the
x = HFH + R ((26-41) first bending mode of a missile, was simulated:
By the invariance property of maximum-likelihood estimates,we know that 0 0 0 0
0 1 0 0
0 0 1 0
-a!1 -a2 -cY3 -a4
PROBLEMS
26-l. Obtain the sensitivity equations for dKe(k f l)/ Jfl,, aP,(k + llk)/&$ and
JPe(k + Ilk i- l)/ &I,. Explain why the sensitivity system for &(k + l/k)/ at?,
and d&+(k + I/k + Q/&3, is linear.
26-2. Compute formulas for g, and H,. Then simplify H, to a pseudo-Hessian.
26-3, In the first-order systemx (k + 1) = ax(k) + w(k). and z (k + 1) = x(k + 1) +
v(k + l), k = 1, 2, . . a) N, a is an unknown parameter that is to be estimated.
TABLE 26-I Parameter Estimates for Missile Example
Sequencesw(k) and v(k) are, as usual, mutually uncorreiated and white, and,
w(k) - N(rr*(k); 0,l) and v(k) - N(v(k); 0, l/z). Explain, using equations and a
flowchart, how parameter a can be estimated using a MLE.
0
ults from correlation technique, Mehra (197Ob). Repeat the preceding problem where all conditions are except that now
1 - I.0706 2.3800 -0.5965 (I.8029 -0.1360 0.8696 0.6830 0.2837 0.4191 0.8207 w(k) and v (k) are correlated, and E{w(k)v(k)} = V4.
estimates using 1000points 26-S. We are interested in estimating the parameters a and r in the following first-order
2 -1.06fkl 2.3811 -O-S938 0.8029 -0.1338 0.8759 0.6803 0.2840 0.42oU 0.8312 system:
3 - 1.0085 2.4026 -0.6054 0.7452 -0.1494 0.9380 0.6304 0.2888 0+4392 I.U3lk x(k + 1) + ax(k) = w(k)
4 -0.9798 2.4409 -0*6u36 0.8161 -0,140s 0.8540 0.6801 0.3210 OA 108 1.1831
5 -0.9785 2.4412 -0.5999 0.8196 -OS1370 0.8%0 0.6803 0.3214 U.6107 1.1835 z(k) = x(k) -t v(k) k = 1,2,. . . , N
6 -0.9771 2.4637 -0.6014 0.8086 -0.1503 0.8841 0.7068 0.3479 0.6059 1.2200
7 -0.9769 2.4603 -Oh023 0.8 130 -0.1470 0.8773 0.7045 0.3429 0.6lCI6 1.2104 Signals w(k) and v (k) are mutualIy uncorrelated, white, and Gaussian, and,
8 - 0.9744 2.5240 -0.6313 0.8105 -0.1631 0.9279 0.7990 0.37% 0.6484 1.2589 E{w(k)) = 1 and E{n(k)) = r.
9 -0.9743 2.5241 -0.6306 0.8 108 -Om1622 O-9296 0.7989 0.3749 0.6480 1.2588 (a) Let 9 = co1(a,$. What is the equation for the log-likelihood function?
10 -0.9734 2 S270 -0.6374 0.7961 -0.1630 0.9so5 0*7974 0.3568 0.6378 1.2577 (b) Prepare a macro flow chart that depicts the sequenceof calculations required
11 -0.9728 2.5313 -0.6482 0.7987 -Owl620 0.9577 0.8103 0.3443 0.6403 1,235i to maximize ,5(9/%), Assume an optimization algorithm is used which re-
12 -0.9720 2.5444 -0.66U2 0.7995 -0.1783 0.9866 0.8487 0.3303 O.m33 1.2053 quires gradient information about L (9/Z.).
13 -0.9714 2.5600 -0.6634 0.7919 -0.2036 1.0280 0.8924 0.3143 II.6014 1.2OS4 (c) Write out the Kalman filter sensitivity equations for parameters a and r.
14 -0.9711 2.5657 -0.6624 0.7808 -0.2148 1.0491 0.9073 0.325i 0.6122 I .2200
26-6. Develop the sensitivity equations for the case considered in Lesson 11, i.e., for
estimates using 100 points
30 -0.9659 2.620 -0Ao94 O-7663 -0.1987 1,0156 1.24 0.136 0.454 1.103
the case where the only uncertainty present in the state-variable model is mea-
surement noise. Begin with t(1319) in (11-42).
al values
-0.94 2.557 -o.twlo 0.7840 -0,180O 1.00 0.8937 0.2957 0.6239 1.2510 26-7. Refer to Problem 24-7. Explain. using equations and a flowchart, how to obtain
mates of standard deviation using 1WOpoints MLEs of the unknown parameters for:
0.0317 0.0277 OS0247 0.0275 0.0261 0.0302 0.0323 o.K+02 0.029 (a) equation for the unsteady operation of a synchronous motor, in which C and
mates of standard deviation using 100 points p are unknown;
0.149 0.104 0.131 0.084 0.184 0,303 0.092 0.082 0.09 (c) Duffings equation, in which C, ct, and /3 are unknown;
(c) Van der Pals equation, in which e is unknown; and
rce: Mehra (1971, pg. 301, Q 1971, AlAA.
(d) Hills equation, in which a and b are unknown.
Notation and Problem Statement 271
INTRODUCTION
E{w(t)v(r)} = 0 (27-5)
The Kalman-Bucy filter is the continuous-time counterpart to the Kalman
filter. It is a continuous-time minimum-variance filter that provides state Equations (27.3), (27,4), and (27-5) apply for t 2 to. Additionally, R(t) is
estimatesfor continuous-timedynamical systemsthat are describedby linear, continuous and positive definite, whereas Q(t) is continuous and positive
(possibly) time-varying, and (possibly) nonstationary ordinary differential semidefinite. Finally, we asume that the initial state vector x(to) may be ran-
equations. dom, and if it is, it is uncorrelated with both w(t) and v(t). The statistics of a
The Kalman-Bucy filter (KBF) can be derived in a number of different random x(to) are
ways, including the following three: Wt,)~ = m&o) (27-6)
and
1. Use a formal limiting procedure to obtain the KBF from the KF (e.g.,
Meditch, 1969). covb(to)l = R&o) (27-7)
2. Begin by assumingthe optimal estimator is a linear transformation of all Measurements z(t) are assumedto be made for tos t s 7.
measurements. Use a calculus of variations argument or the ortho- If x(to), w(t), and v(t) are jointly Gaussianfor all t t [to,?], then the KBF
gonality principle to obtain the Wiener-Hopf integral equation. Em- will be the optimal estimator of state vector x(t). We will not make any
bedded within this equationis the filter kernal. Take the derivative of the distributional assumptions about x(to), w(t), and v(t) in this lesson, being
Wiener-Hopf equation to obtain a differential equation which is the content to establish the linear optimal estimator of x(t).
KBF (Meditch, 1969).
3. Begin by assuminga linear differential equation structure for the KBF,
one that containsan unknown time-varying gain matrix that weights the NOTATION AND PROBLEM STATEMENT
difference betweenthe measurementmade at time t and the estimate of
that measurement. Choose the gain matrix that minimizes the. mean- Our notation for a continuous-time estimate of x(t) and its associatedesti-
squared error (Athans and Tse, 1967). mation error parallels our notation for the comparable discrete-time quan-
tities, i.e., ?(t It) denotes the optimal estimate of x(t) which uses all the mea-
We shall briefly describethe first and third approaches,but first we must surements z(t), where t 2 to, and
define our continuous-time model and formally state the problem we wish to
solve. qt It) = x(t) - k(tlt) (27-8)
270
272 KalmwI-Bucy Filtering Lesson 27 Derivation of KBF Using a Formal Limiting Procedure 273
THE KALMAWWCY FILTER The second approach to deriving the KBF, mentioned in the intro-
duction to this chapter. begins by assumingthat i(rir) can be expressed as in
The solution to the problem stated in the preceding section is the Kalman- (27-17) where A(f,7) is unknown. The mean-squaredestimation error is min-
Bucy Filter, the structure of which is summarizedin the following: imized to obtain the following Wiener-ISopf integral equation:
Theorem 27-l. The KBF is describedby the vector di#erentia/ equation E{x(t)z(~)} - 1 A(t,~)E{z(r)z(a)}d7 = 0 (27-19)
*o
i(tit) = F(t)g(t[t) + K(t)[z(t) - H(t)i(t It)] (27-N) where lo I (7 5 t. When this equation is convertedinto a differential equation,
wheret 2 to,k(t&) = rn&), one obtains the KBF described in Theorem 27-l. For the details of this
derivation of Theorem 27-1 see Meditch, 1969,Chapter 8.
K(t) = P(t ir)H(t)R-l(t) (27-U)
and
DERIVATION OF KBF USING A FORMAL LIMITING PROCEDURE
I$]t) = F(t)P(t/t) + P(tlt)F(t) - P(tlt)H(t)R-(t)H(t)P(tlt)
+ wNw~~~ (27-12) Kalman filter Equation (17-ll), expressedas
Epatiun (27-L& which is a matrix Riccati differential equatiun, is initialized i(k + l]k + 1) = @(k + l,k)ir(kjk)
by qt&J = R(k?). El + K(k + l)[z(k + 1) - H(k + l)@(k + l,k)ri(klk)] (27-20)
Matrix K(t) is the Kaiman-Bucy gain matrix, and P(t/t) is the state- can also be written as
estimation-error covariancematrix, i.e+, G(t + Atlt + At) = @(t + At,t)i(tIr>
P(tlt) = E{%(tlt)S(tlt)} (27-13) + K(r + At)[z(t + At) - H(f + At)@(t + At,r)k(rir>] (27-21)
Equation (27-10)can be rewritten as where we have let fk = t and tk + r = t + AL In Example 24-3 we showed that
i(tlt) = [F(t) - K(t)H(t)]i(t[t) + K(t)z(t) (27-14) @(t + At,t) = I + F(f)At + O(Ar) (27-22)
which makes it very clear that the KBF is a time-varying filter that processes and
the measurementslinearly to produce k(tlt). Q&) = G(r)Q(t)G(t)At + O(Ar) (27-23)
The solution to (27-14)is
Observe that Qd(f) can also be written as
Our objective is to find the m&x function K(T), to5 f 5 7:that minimizes the Substituting (27-45) into (27-46),we seethat
following mean-squarederror,
X(K.P.C) = tr [F(t)P(t!@(t)]
J[K(r)] = E{e(r)e(r)} (27-40) - tr [K(t)H(t)P(t I@(r)]
+ tr [P(tjt)F(t)C(r)]
e(7) = x(7) - ?(T~T) (27-41) - tr [P(tlt)H(QK(t)X(t)l (27-47)
BecauseC* > 0, (C*)- existsso that (27-56) has for its only solution which leads to the following three algebraic equations:
STEADY-STATE KBF It is straightforward to show that the unique solution of these nonlinear algebraic
equations, for which P > 0, is
If our continuous-time systemis time-invariant and stationary, then, when l/2
certain system-theoretic conditions are satisfied (see, e.g., Kwakernaak and 7l2= (4r ) (27-66a)
Sivan, 1972),P(tlt)+ 0 in which caseP(tlt) hasa steady-statevalue, denotedi? 711 =
fi q l/4r3/4
(27-66b)
In this case,K(t) + K, where fi q 3/4r 114
F22 = (27-66~)
jf = jQl~--* (27-58)
The steady-state KB gain matrix is computed from (27~58),as
i? is the solution of the algebraicRiccati equation
F-i?+ i?F - FHR-HF + GQG = 0 (27-59) (27-67)
(27-71)
(27-64)
These eigenvalues are solutions of the equation
s2 + V5 (q /r)"4 s + (q /r)1'2 = 0 (27-72)
280 l Kalman-Bucy Filtering Lesson27 Problems
Lesson 27
we find that
The stochastic linear optimal output feedback regula tar problem problem
(43 = (q /r)14 (27-73) of finding the functional
u(t) = f[Z(T) to 5 I 5 t] (27-78)
6 = 0.707 (27-74) for to I t 5 f1such that the objective function
thus, the steady-state KBF for the simpk duuble integrator system is damped at 0.707.
The filters pules lie on the 45 line depicted in Figure 27-I. They can be moved along q-u] = E (; x(tl)Wlx(tl) + ; 1; HW&(4 + ~Wbu(~)ld~ ) (27-79)
this line by adjusting the ratio q /r; hence, once again. we may view q /r as a filter
tuning parameter. l is minimized. Here W1, WZ, and W3 are symmetric weighting matrices, and,
W1 I 0, W, > 0, and W3 > 0 for toI t I tl.
In the control theory literature, this problem is also known asthe linear-
quadratic-Gaussianregulator problem (i.e., the LQG problem; see, Athans,
1971, for example), We state the structure of the solution to this problem,
without proof, next.
The optimal control, u*(t), which minimizes J[u] in (27-79) is
u*(t) = -F(t)i(tlt) (27-80)
where p(t> is an optimal gain matrix, computed as
P(t) = W,B(r)P,(t) (27-81)
where PC(t)is the solution of the control Riccati equation
--Ii=(t) = F(r)P,(t) + P,(t)F(f) - P,(t)B(t)W;B(r)P,(t)
+ D(t)W,D(t) (2742)
Figure 27-l Eigenvalues of steady-state Kl3F Lie along 245 degree lines+ In- PC(t ,) given I
creasing q /r moves them farther away from the origin. whereas decreasing q /r
moves them doser to the origin. and i(t/t) is the output of a KBF, properly modified to account for the control
term in the state equation, i.e.,
AN MPORTANT APPLtCATtON FOR THE K8F i(t It) = F(t)?(+) + B(t)u*(r) + K(f)[z(t) - H(t)?(+)] (27-83)
Consider the system We see that the KBF plays an essentialrole in the solution of the LQG
problem.
k(r) = F(t)x(r) + B(I)u(~) + G(f)w(~) (27-75)
QJ = x0
for l 2 ffl where x0 is a random initial condition vector with mean m&J and PROBLEMS
covariancematrix P&J. Measurementsare given by
27-l. Explain the replacement of covariance matrix R(k + 1 = t + df) by
z(f) = H(f)x(f) + v(r) (27-76) R(t + At)/Ar in (27-27).
27-2. Show that GIO P(t + AtIt) = P(tIt).
for t 2 lo. The joint random processco1[w(~),v(I)] is a white noise processwith
intensity 27-3. Derive the state equation for error e(r). given in (27-44), and its associated
covariance equation (27-45).
27-4. Prove that matrix Z*(t) is symmetric and positive definite.
A
Concept of Sufficient Statistics 283
Lesson z (IV)], where z(i) = 0 if the ith car is not defective and z(i) = 1 if the ith car is
defective. The total number of observed defective cars is
T(Z) = 5 z(i)
Sufficient Statistics i=l
This is a statistic that maps many different values of z (l), . . . , z (IV) into the samevalue
of T(%). It is intuitively clear that, if one is interested in estimating the proportion 0 of
and Statistical defective cars, nothing is lost by simply recording and using T(3) in place of z (l), . . . ,
z (IV). The particular sequenceof ones and zeros is irrelevant. Thus, as far as estimating
the proportion of defective cars, T(Z) contains all the information contained in %. 0
CONCEPT OF SUFFICIENT STATISTICS which is independent of 8; hence, T(S) = Cy= 1 z(i) is sufficient. Any one-to-one
function of T (3) is also sufficient. 0
The notion of a sufficient statistic can be explained intuitively (Ferguson,
1967), as follows. We observe3(N) (3 for short), where 55= co1[z(l), z(2), This example illustrates that deriving a sufficient statistic using Defini-
z(N)], in which z(l), . . . , z(N) are independentand identically distrib- tion A-l can be quite difficult. An equivalent definition of sufficiency, which is
ute; random vectors, each having a density function p(z(i)l8), where 8 is easy to apply, is given in the following:
unknown. Often the information in % can be representedequivalently in a
statistic, T(Z), whosedimension is independentof N, such that T(%) contains Theorem A-l (Factorization Theorem). A necessary and sufficient
all of the information about 0 that is originally in %. Sucha statistic is known as condition for T(Z) to be sufficient for 8 is that there exists a factorization
a sufficient statistic.
Example A-l
Consider a sampled sequenceof N manufactured cars. For each car we record whether where the first factor in (A-l) may depend on 8, but depends on 9 only through
it is defective or not. The observed sample can be represented as % = co1 [z(l), . . . , T(Z), whereas the second factor is independent of 8. Cl
* This lesson was written by Dr. Rama Chellappa, Department of Electrical Engineering- The proof of this theorem is given in Ferguson(1967) for the continuous
Systems,Unversity of Southern California. Los Angeles: CA 9089. caseand Duda and Hart (1973) for the discrete case.
282
Sutiicient Statistics and Statistical Estimation of Parameters Lesson A
Exponental Families of Distributions
Example A-3 (Continuation of Example A-2)
In Example A-2, the probability distribution of samplesz [ l), . . . , z (NJ is pie, the family of normal distributions N(~,c?), with 2 known and p
unknown, is an exponential family which, as we have seen in Example A-4,
has a one-dimensional sufficient statistic for CL,that is equal to xr 1z(i). As
where the total number of defective cars is t = either 0 or 1. Bickel and Doksum (1977) state,
Ecluation (A-2) can be written quivalently as
Definition A-2 (Bickel and Doksum, 1977). If there exist real-valued
functions a@), and b(9) on parameter space 0, and real-valued functions T(z)
[z is short for z(i)] and h(z) on R, such that the density function ~(~10)can be
Comparing (A-3) with (A-I), we conclude that
written as
h(S) = 1
p(zl0) = exp [a@)T(z) + b(0) + h(z)] (A-4)
+e + N ln (1 - 0)
1 then p(zjO), 8 E 9, is said to be a one-parameter exponential family
dktributions. 0
of
Definition A-3. If there exist real matrices Al, . . . , A,, a real function b
of 8, where 8 E 8, real matrices I@) and a real function h(z), such that the and L(@Z) = In 2(0/Z). M aximum-likelihood estimates of 8 are obtained by
density function ~(~19)can be written as solving the systemof n equations
of
ww3
ae,
= 0
i = 1,2,. . . , n (A-8)
for eMLand checking whether the solution to (A-8) satisfies (A-7).
distributions. Cl When this technique is applied to membersof exponential families, iML
Example A-6 can be obtained by solving a set of algebraicequations. The following theorem
The family of d-variate normal distributions N(p,P,), where both p and Pp are un- paraphrased from Bickel, and Doksum (1977) formalizes this technique for
known, is an example of a 2-parameter exponential family in which 8 contains k and vector observations.
the elements of Pk. In this case
Theorem A-2 (Bickel and Doksum, 1977). Let p(zle) =exp[a(O)T(z) +
Al(e) = al(e) = Pi1 p
b(8) + h(z)] and let ~4 denote the interior of the range of a(e). If the equation
T*(z) = z
WT(z)l = T(z) (A-9)
A*(e) = -;P;l
has a solution e(z) for which a[b(z)] t d, then e(z) is the unique MLE of
T2(z) = zz 8. q
The proof of this theorem can be found in Bickel and Doksum (1977).
and Example A-7 (Continuationof Example A-5)
h(z) = 0 113 In this case
Np = 2 z(i)
i=l
which is the well-known MLE of p. 0
a(O) = 0
Using Theorem A-1 or Definition A-3 it can be seen that zy= Iz(Q and X7= I z(qz(i) T(zE(k)) = W(k)W(k)zE(k)
are sufficient for (p, Pw).Letting N
b(e) = --ln27r - i In det a(k) - i ew(k)W(k)3e(kp
2
and
Observe that
Ee(X(k)W(k)5T(k)} = W(k)9V(k)%e(k)B
we find that
hence, applying (A-9), we obtain
X(k)W(k)X(k)il= x(k)w(k)zE(k)
Sufficient Statistics and Statistical Estimation of Parameters Lesson A Sufficient Statistics and Uniformly Minimum-Variance Unbiased Estimation 291
whose solution, 6(k), is Completeness is a property of the family of distributions of T(Z) gener-
ated as ovaries over its range. The concept of a complete sufficient statistic, as
ii(k) = [X(k)9i-1(k)%e(k)]-1W(k)9i-(k)%(k)
stated by Lehmann (1983), can be viewed as an extension of the notion of
which is the well-known expression for the MLE of 8 (see Theorem 11-3). The case sufficient statistics in reducing the amount of useful information required for
when R(k) = $1, where 2 is unknown can be handled in a manner very similar to that the estimation of 6. Although a sufficient statistic achieves data reduction, it
in Example A-8. ci may contain some additional information not required for the estimation of 8.
For instance, it may be that E&( T(%))] is a constant independent of 6 for
some nonconstant function g. If so, we would like to have E&( T(S))] = c,
SUFFICIENT STATlSTICS AND UNIFORMLY
(constant independent of 6) imply that g( T(Z)) = c. By subtracting c from
MINIMUM-VARIANCE UNBIASED ESTIMATION
EBGmwl~ one arrives at Definition A-4. Proving completeness using Defi-
nition A-4 can be cumbersome. In the special case when p (z(k)16) is a one-
In this section we discusshow sufficient statistics can be used to obtain uni- parameter exponential family, i.e., when
formly minimum-variance unbiased(UMVU) estimates. Recall, from Lesson
6, that an estimate 6 of parameter 8 is said to be unbiased if (A-13)
E(8) = 9 (A-10) the completeness of T(z(k)) can be verified by checking if the range of a (0)
has an open interval (Lehmann, 1959).
Among such unbiased estimates,we can often find one estimate?denoted 8)
which improves all other estimatesin the sensethat Example A-10
var (0*) 5 var (i) (A-11) Let 3 = co1[r(l), . . . , z(N)] be a random sample drawn from a univariate Gaussian
distribution whose mean p is unknown, and whose variance 2 > 0 is known. From
When (A-l 1) is true for all (admissible) values of 6, 0 is known as the Example A-5, we know that the distribution of % forms a one-parameter exponential
UMVU estimate of 8. The UMVU estimator is obtained by choosing the family, with T(Z) = xy= 1 z(i) and a(p) = p lo? Because a(p) ranges over an open
estimator which has the minimum variance among the classof unbiasedesti- interval as p varies from -- to +m, T(%) = CrX 1r(i) is complete and sufficient.
mators. If the estimator is constrained further to be a linear function of the The same conclusion can be obtained using Definition A-4 as follows. We must
observations, then it becomesthe BLUE which was discussedin Lesson9. show that the Gaussian family of probability distributions (with p unknown and 2
Suppose we have an estimate, i(Z), of parameter 6 that is based on fixed) is complete. Note that the sufficient statistic T(2) = cy= l z (i) (see Example
A-5) is Gaussian with mean Np and variance Nc?. Supposeg is a function such that
observations Z = co1 [z(l), . . . , z(N)]. Assume further that p (910) has a
E,(g( T)} = 0 for all -- < p < 0~;then,
finite-dimensional sufficient statistic, T(Z), for 8. Using T(s), we can con-
struct an estimate e*(Z) which is at least as good as, or even better, than i by
the celebrated Rao-Blackwell Theorem (Bickel and Doksum, 1977).We do
this by computing the conditional expectation of @Z), i.e.,
= \:- &g(vaN + Np) exp (-g) dv = 0 (A-14)
e*(z) = E{@l!I)~T(%)} (A-12)
Estimate 0*(Z) is better than 6 in the sense that E{ [O*(Z) - 01) < implies g ( ) = 0 for all values of the argument of g. 0
l
Theorem A-4 [Bickel and Doksum (1977) and Lehmann (1959)]. Let
p(zle) be an m-parameter exponentialfamily given by
A proof of this theorem can be found in Bickel and Doksum (1977).
This theorem can be applied in two ways to determine an UMVU esti-
mator [Kckel and Doksum (1977),and Lehmann (1959)]. where
p(zlf3) = exp 5 Qj
[ i= 1
(e)1;:(2)+ b(e) + h(z)
1
a,, . . . , a, and b are real-valued functions of 0, and T,, . . . , T, and h are
real-valued functims of z. Suppose that the range of a = co1 [al(O), . . . , a,(9)]
Method I. Find a staGsticof the form h (T(T)) such that has an open m-rectangle [if (x,, yJ, . . . , (xm,ym>are m open intervals, the set
-@I,..., S,): Xi < Si< yj, 1 I i zs m) is called the open m-rectangle], rhen
T(z) = col [TI(z), . . . , Tm(z)]is complete as well as sufficient. 1
where T(S) is a complete and sufficient statistic for 8. Then,
h (T(g)) is an UMVU estimator of 0. This follows from the fact Example A-13 (This example is taken from Bickel and Doksum, 1977, pp. 123-124)
that As in Example A-4, let 9 = co1 [z(l), . . . , z (Iv)] be a sample from a IV@ ,2) popu-
lation where both p and d are unknown. As a special caseof Example A-6, we observe
that the distribution of 2 forms a two-parameter exponential family where 9 = co1
Method 2. Find an unbiasedestimator, i, of 0; then, E{@(Z)j is an UMVU (CL,d). Because co1[al(9), a2(e)] = co1(p /c?,-I/X?) ranges over the lower halfplane,
estimator of 8 for a complete and sufficient statistic T(S). as 9 ranges over co1[(- m,m), (O,m)], the conditions of Theorem A-4 are satisfied. As a
result, T(S) = co1[Cf= 1z (i), $= 1z (i)] is complete and sufficient. Cl
Example A-11 (Continuation of Example A-10)
We know that T(S) = xy=I z (i) is a completeand sufficient statistic for g, Further- Theorem A-3 also generalizesin a straightforward manner to:
rnore, l/N I!/= I z (i ) is an unbiased estimator of p; hence, we obtain the well-known
result from Method 1, that the samplemean,l/N zySI z(i), isanUMVU estimateof p. Theorem A-5. If a complete- and sufficient statistic T(Z) = co1
Because this estimator is linear, it is alsothe BLUE of p+ 0 (-W-Q, + . . 7 Tm(W exists for 8, and 8 is an unbiased estimator of 0, then
ExampIe A-12 (Linear Model ) e*(Z) = E(@T(Z)) is an UMVU estimator of 8. If the elements ofthe covariance
matrix of 8* (2.) are < mfor all 8, then B*(Z) is the unique UMVU estimate of
As in ExampleA-9, considerthe linear
8. 0
?x(k) = x(k)e + T(k) (A-16)
where 8 is a deterministic but unknown n x 1 vector of parameters, X(k) is deter- The proof of this theorem is a straightforward extension of the proof of
ministic, and Ew(k)} = 0. Additionally, assume that V(k) is Gaussian with known Theorem A-3, which can be found in Bickel and Doksum (1977).
covariance matrix 3(&z). Then, the statistic T@(k)) = X(k)%-%(k) is sufficient (see Example A-14 (Continuation of Example A-13)
Example A-9). That it is alsocomplete can be seen by using Theorem A-4. To obtain
UMVU estimate 6, we need to identify a function Iz[T(Z(k))] such that In Example A-13 we saw that co1 [T,(Z), G(Z)] = co1[zyz 1 z(i), xi,. , z(i)] is suf-
F$z [T@!(k))]) = 9. The structure of /I [T@(k))] is obtained by observing that ficient and complete for both p and 2. Furthermore, since
estimators are found, for example, in Bickel and Doksum (1977) and
Lesson A
Appendix A
Lehman (1980).
PROBLEMS Glossary
A-l. Suppose z(l), . . . , z(N) are independent random variables, each uniform on
[0,6], where 8 > 0 is unknown. Find a sufficient statistic for 8.
A-2. Supposewe have two independent observations from the Cauchy distribution,
of Maior Results
1 1 -xc<zx
P(Z) = -
-7r1 + (2 - 0)
Show that no sufficient statistic exists for 8.
A-3. Let z(l), z(2), . . . , z(N) be generated by the first-order auto-regressive
process,
z(i) = &(i - 1) + fPw(9
where {w(i), i = l,..., N} is an independent and identically distributed
Gaussian noise sequencewith zero mean and unit variance. Find a sufficient
statistic for 8 and p.
A-4. Suppose that T(%) is sufficent Afor 8, and that 6(s) is a maximum-
likelihood estimate of 8. Show that e(Z) depends on (r: only through T(3). Equations (3-10) Batch formulas for &+&k) and &#).
A-S. Using Theorem A-2, derive the maximum-likelihood estimator of 8 when and (3-11)
observations z (l), . . . , z(N) denote a sample from Theorem 4-1 Information form of recursive LSE.
p (2 (i)le) = 8e -e4i) z(i) rO,e >O Lemma 4-l Matrix inversion lemma.
A-6. Show that the family of Bernoulli distributions, with unknown probability of Theorem 4-2 Covariance form of recursive LSE.
successp (0 2sp ~5l), is complete. Theorem 5-l Multistage LSE.
A-7. Show that the family of uniform distributions on (0,Q where 8 > 0 is unknown, Theorem 6-l Necessary and sufficient conditions for a linear
is complete. batch estimator to be unbiased.
A-8. Let z(l), . . . , z(N) be independent and identically distributed samples, where Theorem 6-2 Sufficient condition for a linear recursive esti-
p (z (i)iS) is a Bernoulli distribution with unknown probability of success
p (0 5 p 5 1). Find a complete sufficient statistic, T; the UMVU estimate 4(T)
mator to be unbiased.
of p; and, the variance of 4(T). Theorem 6-3 Cramer-Rao inequality for a scalarparameter.
A-9. [Taken from Bickel and Doksum (1977)]. Let z(l), z(2), . . . , z(N) be an Corollary 6-l Achieving the Cramer-Rao lower bound.
independent and identically distributed sample from N&,1). Find the UMVU Theorem 6-4 Cramer-Rao inequality for a vector of param-
estimator of pg [z (1) 101. eters.
A-10. [Taken from Bickel and Doksum (1977)]. Supposethat & and z are two UMVU Corollary 6-2 Inequality for error-variance of ith parameter.
estimates of 8 with finite variances. Show that z = T2.
Theorem 7-l Mean-squared convergence implies convergence
A-11. In Example A-12 prove that T@!(k)) is complete.
in probability.
Theorem 7-2 Conditions under which i(k) is a consistent esti-
mator of 8.
Theorem 8-l Sufficient conditions for &&) to be an unbiased
estimator of 8.
Theorem 8-2 A formula for cov [&&)].
Appendix A
Glossary of Major Results
A formula for cov [&(k)~ under special condi- Theorem 13-l A formula for 8,,(k) (The Fundamental Theorem
Corollary 8-l of Estimation Theory).
tions on the measurement noise.
Corollary 13-1 A formula for i&k) when 8 and Z(k) are jointly
Theorem 8-3 An unbiasedestimator of 2.
Gaussian.
Theorem 8-4 Sufficient conditions for &(k) to be a consistent
Corollary 13-2 A linear mean-squaredestimator of 8 in the non-
estimator of 0.
Gaussiancase.
Theorem 8-5 Sufficienl conditions for & (k) to be a consistent
Corollary 13-3 Orthogonality principle.
estimator of 2.
Theorem 13-2 When 6MAP(k)= k&).
Equation (9-22) Batch formula for &&IQ
Theorem 14-1 Conditions under which i,,(k) = &&k)-
Theorem 9-l The relationship between &(ti) and &&).
Theorem 14-2 Condition under which 8&k) = &Ltj(k).
Corollary 9-1 When all the results obtained iyLessons 3,4 and 5
for b-(k) can be applied to Or&k). Theorem 14-3 Condition under which &*~(k) = &LU(~).
Theorem 9-2 When 6&k) equals 6&) (Gauss-Markov,The- Theorem 15-l Expansion of a joint probabihty density function
orem). for a first-order Markov process.
peorem 9-3 A formula for cov [6&Q]. Theorem 15-2 Calculation of conditional expectation for a first-
order Markov process.
Corollary 9-2 The equivalencebetween P(k) and cov [&&)].
Theorem 15-3 Interpretation of Gaussianwhite noise as a special
Theorem 9-4 Most efficient estimator property of 6&k).
first-order Markov process.
Corollary 9-3 When 6&) is a most efficient estimator of 6. The basicstate-variable model.
Equations (U-17) &
Theorem 9-5 Invariance of hBLu(k) to scalechanges. (15-18)
Theorem 9-6 Information form of recursive BLUE. Theorem 15-4 Conditions under which x(k) is a Gauss-Markov
Theorem 9-7 Covarianceform of recursive J3LUE. sequence.
Definition 10-l Likelihood defined. Theorem 15-5 Recursive equations for computing mX(k) and
Theorem lo- 1 Likelihood ratio of combined data from statisti- R(k).
cally independent sets of data. Theorem 15-6 Formulasfor computing m, (k) and P,(k).
Theorem 11-l Large-sample properties of maximum-likelihood Equations (16-4) & Single-stage predictor formulas for i(k jk - 1)
estimates. (16-11) and P(k jk - 1).
Theorem 1l-2 Invariance property of MLEs. Theorem 16-l Formula for and properties of general state pre-
Theorem I l-3 Condition under which 6&) = 6&k). dictor, i(klj), k > j.
Corollary 11-l Conditions under which &(k) = &&c) = Theorem 16-2 Representationsand properties of the innovations
iLS(k), and, resulting estimator properties. process.
Theorem 12-l A formula for ~(xiy) when x and y are jointly Theorem 17-l KaIman filter formulas and properties of resulting
Gaussian. estimatesand estimation error.
Theorem 12-2 Propertiesof E{xiyl when x and y are jointly Gaus- Theorem 19-l Steady-stateKalman filter.
sian. Theorem 19-2 Equivalenceof steady-state Kalman filter and in-
Theorem 12-3 Expansion formula for E{x\y ,zl when x, y, and z finite length digita Wiener filter.
are jointly Gaussian,and y and z are statistically Theorem 20-1 Single-statesmoother formula for i(kjk + 1).
independent.
Corollary 20-l ReIationshipbetween single-stagesmoothing gain
Theorem 12-4 Expansion formula for lZ{x~y,z~when x, y and z matrix and Kalman gain matrix,
are jointly Gaussian and y and z are not neces-
Corollary 20-2 Another way to expressd(klk + 1).
sarily statistically independent.
298 Glossary of Major Results Appendix A 299
Theorem 20-2 Double-stage smoother formula for ri(klk + 2). Equations (24-39) & Discretized state-variable model.
Corollary 20-3 Relationship betweendouble-stagesmoothing gain (24-44)
matrix and Kalman gain matrix. Theorem 25-1 A consequenceof relinearizing about i(kIk).
Corollary 20-4 Two other waysto expressi(k jk + 2). Equations (25-22) & Extended Kalman filter prediction and correction
Theorem 21-1 Formulas for a useful fixed-interval smoother of (25-27) equations.
x(k), %k IN), and its error-covariance matrix, Theorem 26-l Formula for the log-likelihood function of the
P(k IN): basic state-variablemodel.
Theorem 21-2 Formulas for a most useful two-passfixed-interval Theorem 26-2 Closed-form formula for the maximum-likelihood
smoother of x(k) and its associated error- estimate of the steady-statevalue of the inno-
covariancematrix. vations covariancematrix.
Theorem 21-3 Formulas for a most useful fixed-point smoothed Theorem 27-l Kalman-Bucy filter equations.
estimator of x(k), jZ(k Ik + Z) where Z = 1, Definition A-l Sufficient statistic defined.
2 and its associatederror-covariance ma- Theorem A-l Factorization theorem.
t&i ;?(klk + I). Theorem A-2 A method for computing the unique maximum-
Theorem 22-l Conditions under which a single-channel state- likelihood estimator of 0 that is associatedwith
variable model is equivalent to a convolutional exponential families of distributions.
sum model. Theorem A-3 Lehmann-ScheffeTheorem. Provides a uniformly
Theorem 22-2 Recursive minimum-variance deconvolution for- minimum-variance unbiasedestimator of 8.
mulas. Theorem A-4 Method for determining whether or not T(z) is
Theorem 22-3 Steady-stateMVD filter, and zero phase nature of complete as well as sufficient when p (~10)is an
6swo m-parameter exponential family.
Theorem 22-4 Equivalence betweensteady-stateMVD filter and Theorem A-5 Provides a uniformly minimum-variance unbiased
Berkhouts infinite impulse response digital estimator of vector 0.
Wiener deconvolution filter.
Theorem 22-5 Maximum-likelihood deconvolution results.
Theorem 22-6 Structure of minimum-variancewaveshaper.
Theorem 22-7 Recursive fixed-interval waveshapingresults.
Theorem 23-1 How to handle biases that may be present in a
state-variablemodel.
Theorem 23-2 Predictor-corrector Kalman filter for the cor-
related noise case.
Corollary 23-1 Recursive predictor formulas for the correlated
noise case.
Corollary 23-2 Recursive filter formulas for the correlated noise
case.
Equations (24-l) & Nonlinear state-variablemodel.
(24-2)
Equations (24-23)& Perturbation state-variablemodel.
(24-30)
Theorem 24-1 Solution to a time-varying continuous-time state
equation.
References
References BARD, Y. 1970. Comparison of gradient methods for the solution of nonlinear param-
eter estimation problems. SIAM J. Numerical Analysis, Vol. 7, pp. 157-186.
% BERKHOUT.A. G. 1977. Least-squares inverse filtering and wavelet deconvolution.
Geophysics, Vol. 42, pp. 1369-1383.
BICKEL, P. J.. and K. A. DOKSUM. 1977. Marhematical Starisfics: Basic Ideas and
Selected Topics. San Francisco: HoIden-Day, Inc.
BERMAN, G. J. 1973a. A comparison of discrete hnear filtering algorithms. KEE
Trans. on Aerospace and Electronic Sysrems, Vol. AES-9, pp. 28-37,
BIERMAN, G. J. 1973b. Fixed-interval smoothing with discrete measurements. Inr. J.
Conrrui, Vol. 18, pp. 65-75.
BIERMAN, G. J. 1977. Factorization Methods for Discrete Sequential Esrimation. NY:
Academic Press.
BRYSOK,A. E., JR., and M. FRAZIER. 1963. Smoothing for linear and nonlinear dy-
namic systems. TDR 63-119, pp. 353-364, Aero. Sys. Div., Wright-Patterson Air
Force Base, Ohio.
BRYSON,A. E., JR., and D. E. JOHANSEN.1965. Linear filtering for time-varying
systems using measurementscontaining colored noise. IEEE Trans. on Automatic
Control, Vol. AC-10, pp. 4-10.
BRYSON,A. E., JR., and Y. C. Ho. 1969. Applied Optimal Control. Waltham, MA:
AGLXLERA, IL J. A. DEBREMAECKER, and S. HERNANDEZ. 1970. Design of recursive Blaisdell.
filters. Geophysics,Vol. 35, pp. 247-253.
CHEN, C, T. X970. Introduction to Linear Sysrem T/zeory. NY: I-Iolt.
ANIIERSON, B. D. 0.. and J. B. MOORE. 1979, QptimaZ Fikering. Eqlewood Cliffs, NJ:
CHI, C. Y. 1983. Single-channel and multichannel deconvolution. Ph.D. dis-
Prentice-Hall. sertation, Univ. of Southern Cahfiornia, Los Angeles, CA.
ACM, M. 1967. Optimization of Stochastic Systems- Tupics in Discrete- Time Systems.
CHI, C. Y., and J. M. MENDEL. 1984. Performance of minimum-variance decon-
NY: Academic Press.
volution fiiter. IEEE Trans. on Acoustics, Speech and Signal Processing, Vol.
ASTR~M, K. J. 196K Lectures on the identification probiem-the least squares ASSP-32, pp. 1145-1153.
method. Rept. No, 6806%Lund Institute of Technology, Division of Automatic CRAMER,H. 1946. Mathematical Methods of Statisrics, Princeton, NJ: Princeton Univ.
CcmtroL
Press.
ATHA?& M. 1971. The role and use of the stochastic linear-quadratic-Gaussian prob-
DAI, G-Z.. and 3. M. MENDEL. 1986. General Problems of Minimum-Variance Recur-
lem in control systemdesign. IEEE Trans. on Automatic ContruI, Vol. AC-16 pp. sive Waveshaping. IEEE Trans. on Acoustics, Speech and Signal Processing, Vol.
529-552.
ASSP-34.
ATHANS. M., and P. L. FALB. 1965. Optimal Contr~k Ati l~troductbn to the Theory and
DONGARRA,J.J.,J.R. BUNCH.~. B. MOLER, and G.W. STEWART.1979. LINPACK
Its Applications. NY: McGraw-Hill.
Users Guide, Philadelphia: SIAM.
AI-HANS, M., and F. SCFIWEPPE.1965. Gradient matrices and matrix calculations.
DUDA, R. D., and P. E. HART. 1973. Pattern Clussification and Scene Analysis. NY:
MIT Lincoln Labs., Lexington, MA, Tech, Note 196553. Wiley Interscience.
ATHANS, M., and E. TSE.1967.A direct derivation of the optimai linear filter using the
EDWARDS,A. W. F. 1972. Likelihood. London: Cambridge Univ. Press.
maximum principle, IEEE Trans. OJIAutomatic Contrd, Vol. AC-12, pp. 690-698.
FAURRE. P. L. 1976. Stochastic Realization Algorithms. in System Identificarion:
ATHANS. M., R. P. WISHNER, and A. BERTOLIM 1968. Suboptimal state estimation Advances and Case Studies (eds., R. K. Mehra and D. G. Lainiotis), pp. l-25. NY:
for continuous-time noniinear systems from discrete noisy measurements.* IIZI!X Academic Press.
Trans. WI Automatic Control, Vol. AC-13, pp. 504-514.
FERGWSON, T. S. 1967. MarhematicaZ Starisrics: A Decision Theoretic Approach. NY:
BARAMUN, E. W., and M, KATZ, JR. 1959. Sufficient Statistics of Minimal Dimen-
Academic Press.
sion, Sunkhya, Vol. 21, pp. 217-246.
FRASER,D. 1967. Discussion of optimal fixed-point continuous linear smoothing (by
BARANKIN, E. W. 1961. Application to Exponential Families of the Sohnion to the
J. S. Meditch). Proc. 1967 Joint Automatic Conrrol Conf., p. 249, Univ. of PA,
Mn-nmal Dimensionality Problem for Sufficient Statistics. Buk hs~. hterrmt. Stat., Philadelphia.
Vol. 38, pp. 141-150.
References References
GOLDBERGER, A. S. 1964. Econometric Theory. NY: John Wiley. MARQUARDT,D. W. 1963. An algorithm for least-squares estimation of nonlinear
GRAYBILL, F. A. 1961. An Introduction to Linear Statistical Models. Vol. 1, NY: parameters. J. Sot. Indust. Appl. Math., Vol. 11, pp. 431-441.
McGraw-Hill. MCLOUGHLIN,D. B. 1980. Distributed systems-notes. Proc. 1980 Pre-JACC Tu-
GUPTA, N. K., and R. K. MEHRA. 1974. Computational aspects of maximum like- torial Workshop on Maximum-Likelihood Identification, San Francisco, CA.
lihood estimation and reduction of sensitivity function calculations. IEEE Trans. MEDITCH,J. S. 1969. Stochastic Optimal Linear Estimation and Control. NY: McGraw-
on Automatic Control, Vol. AC-19, pp. 774-783. Hill.
GURA, I. A., and A. B. BIERMAN. 1971.On computational efficiency of linear filtering MEHRA, R. K. 1970a. An algorithm to solve matrix equations PHT = G and
algorithms. Automatica, Vol. 7, pp. 299-314. P = @P<p + ITT. IEEE Trans. on Automatic ControZ, Vol. AC-15
HAMMING, R. W. 1983. Digital Filters, 2nd Edition. Englewood Cliffs, NJ: Prentice- MEHRA, R. IS. 1970b. On-line identification of linear dynamic systemswith applica-
Hall. tions to Kalman filtering. Proc. Joint Automatic Control Conference, Atlanta, GA,
HO, Y. C. 1963. On the stochasticapproximation and optimal filtering. J. of pp. 373-382.
Math. Ana!. and Appl., Vol. 6, pp. 152-154. MEHRA, R. K. 1971. Identification of stochasticlinear dynamic systemsusing Kalman
JAZWINSKI, A. H. 1970. Stochastic Processes and Filtering Theory. NY: Academic filter representation. AIAA J., Vol. 9, pp. 28-31.
Press. MEHRA, R. K., and J. S. TYLER. 1973. Case studies in aircraft parameter identi-
KAILATH, T. 1968. An innovations approach to least-squares estimation-Part 1: fication. Proc. 3rd IFAC Symposium on Identification and System Parameter
Linear filtering in additive white noise. IEEE Trans. on Automatic Control, Vol. Estimation, North Holland, Amsterdam.
AC-13, pp. 646-655. MENDEL,J. M. 1971. Computational requirements for a discrete Kalman filter.
KAILATH, T. K. 1980. Linear Systems. Englewood Cliffs, NJ: Prentice-Hall. IEEE Trans. on Automatic Control, Vol. AC-16, pp. 748-758.
ULMAN. R. E. 1960. A new approach to linear filtering and prediction problems. MENDEL,J. M. 1973. Discrete Techniques of Parameter Estimation: the Equation Error
Trans. ASME J. Basic Eng. Series D, Vol. 82, pp. 35-46. Formulation. NY: Marcel Dekker.
KALMAN, R. E., and R. BUCY. 1961. New results in linear filtering and prediction MENDEL,J. M. 1975. Multi-stage least squares parameter estimators. IEEE Trans.
theory. Trans. ASME, J. Basic Eng., Series D, Vol. 83, pp. 95-108. on Automatic Control, Vol. AC-20, pp. 775-782.
KASHYAP, R. L., and A. R. RAO. 1976. Dynamic Stochastic Models from Empirical MENDEL,J. M. 1981. Minimum-variance deconvolution. ZEEE Trans. on Geoscience
Data. NY: Academic Press. and Remote Sensing, Vol. GE-19, pp. 161-171.
KELLY, C. N., and B. D. 0. ANDERSON. 1971. On the stability of fixed-lag smoothing MENDEL,J. M. 1983a. Minimum-variance and maximum-likelihood recursive wave-
algorithms. J. Franklin Inst., Vol. 291, pp. 271-281. shaping. IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-31,
KMENTA, J. 1971. Elements of Econometrics. NY: MacMillan. pp. 599-604.
MENDEL,J. M. 1983b. Optimal Seismic Deconvolution: an Estimation Based Approach.
KOPP, R. E., and R. J. ORFORD. 1963. Linear regression applied to system identi-
NY: Academic Press.
fication for adaptive control systems. AZAA J., Vol. 1, pp. 2300.
MENDEL,J. M., and D. L. GIESEKING. 1971. Bibliography on the linear-quadratic-
KUNG, S. Y. 1978. A new identification and model reduction algorithm via singular
gaussianproblem. IEEE Trans. on Automatic Control, Vol. AC-16, pp. 847-869.
value decomposition. Paper presentedat the 12th Annual Asilomar Conference on
Circuits, Systems,and Computers, Pacific Grove, CA. MORRISON,N. 1969. Introduction to Sequential Smoothing and Prediction. NY:
McGraw-Hill.
KWAKEFWAAK, H., and R. SIVAN. 1972. Linear Optimal Control Systems. NY: Wiley-
Interscience. NAHI, N. E. 1969. Estimation Theory and Applications. NY: John Wiley.
OPPENHEIM,A. V., and R. W. SCHAFER.1975. Digital Signal Processing. Englewood
LAUB, A. J. 1979. A Schur method for solving algebraic Riccati equations. IEEE
Cliffs t NJ: Prentice-Hall.
Trans. on Automatic Control, Vol. AC-24, pp. 913-921.
PAPOULIS,A. 1965. Probability, Random Variables, and Stochastic Processes. NY:
LEHMANN, E. L. 1959. Testing Statistical Hypotheses. NY: John Wiley.
McGraw-Hill.
LEHMANN, E. L. 1980. Theory of Point Estimation. NY.: John Wiley.
&LED, A., and B. LIU. 1976. Digital SignaZ Processing: Theory, Design, and
LJUNG, L. 1976. Consistency of the Least-Squares Identification Method. IEEE Implementation. NY: John Wiley.
Trans. on Automatic Control. Vol. AC-21, pp. 779-781. RAUCH,H. E., F. TUNG, and C. T. STRIEBEL.1965. Maximum-likelihood estimatesof
LJUNG, L. 1979. Asymptotic behavior of the extended Kalman filter as a parameter linear dynamical systems. AIAA J., Vol. 3, pp. 1445-1450.
estimator for linear systems. IEEE Trans. on Automatic Control, Vol. AC-24, pp. SCHWEPPE, F. C. 1965. Evaluation of likelihood functions for gaussiansignals. IEEE
36-50. Trans. on Information Theory, Vol. IT-11, pp. 61-70.
estimation; Maximum-likeli- forced state equatron model,
References
lndex hood estimation)
Estimator properties (see Small
sample properties of estima-
10
ImpuIse responseidentification (see
Identification of)
SCHWEWE,F. C. 1973. Uncertain Dynumzc ~ystemx. Englewood Cliffs, NJ: Prentice- tors: Large sample properties Information form of recursive least-
of estimators) squaresestimator. 29. 91
Hall. Estimation of random parameters fnitial condition identification (see
(seeMean-squared estimation; Identification of)
SH+4mS.J. L. 1967. Recursion filters for digital processing. CUX$Z~~~C~~ Vol. 32. pp. Maximum a posteriori estima- Innovations process:
32-X Aerospacevehicle example, 21-23 tion; Best linear unbiased esti- defined. 146
Algebraic Riccati equation, 17& mator; Least-squares estima- properties of. 146-147
SORENSON, H. W. 1970. Least-squares estimation: from Gauss to Kahnan. I,&!% 171,278 tor) Instrument calibration example,
Asymptotic distributions, U-56 Estimation techni ues, 3 (see 20-21.35-36
spectrum, Vol. 7?pp. 63-68. Asymptotic efficiency (see Large Best linear unIi tased estima- Invariance property of maximum-
SORENSON, H. LV. 1980. Parameter Estimation: Principies and Prublems. NY: Marcel sam le properties of estima- tor: Least-squares estimator; likehhood estimators (see
torsP Maximum-likelihood estima- Maximum-likelihood
Dekker. asymptotic mean. 55-56 tors; Mean-squared estima- estimates)
Asymptoticunbiasedness(seeLarge tors; Maximum a posteriori Iterated least squares,248-249
SORENSON, IX W., and J. E. SACKS.1971. Recursive fading memory filtering. I@r- sam le properties of esrima- estimation; State estimation)
mahm Science,Vol* 3, pp. 101-119. torsP Expanding memory estimator, 23 Kalman-Bucy filtering:
Asymptotic variance, 56 Exponential family of distributions, derivation using a formal limiting
STEFANI,R. T. 1967. Design and simulation of a high performance, digital, adaptive, 284-286 procedure. 27X275
Basic state-variable modeI: Extended Kaiman filter: derivation when structure of filter
normal acceleration control system using modern parameter estimation tech- defined, 131-132 application to parameter estima- is prespecified, 275-278
niques. Rept. No. DAC-6U637, Douglas Air&t Co,, Santa Monica, CA. properties of, 133--137 tion, 255-256 notation and problem statement
Batch processing(set Least-squares correction equation, 253 for derivation, 27l-272
STEPNER,ID. E., and R. K. MEIIRA. 1973. Maximum likelihood identification and estimation) derived, 249-253 optima1 control application, 28&
Best linear unbiased estimation, iterated, 253-254 101
,201
optimal input design for identifying aircraft stability and coritroi derivatives. Ch. 73-79 prediction equation, 252 statement of, 272
IV, NASA-CR-2200. Best linear unbiased estimator: steady-state, 278-280
comparisonwith maximum a pos- Factorization theorem, 283 system
me* description for derivation,
STEWART,G. W. 1973. Introduction to Matrix Compututions. NY: Academic Press. teriori estimator, 123-124 Fibering: LII
comparison with mean-squared computations, 156 Kalman-Bucy ain matrix, 272
TREEL, S. 1970- Principles of digital multichannel tiitering. Geophysics? VoI. estimator, 122-123 covarianceformulation, 157 Kahnan filter cg
seeFiltering; Steady-
comparison with weighted least- diy;-ce phenomenon, 167- state Kalman filtering)
XXXV, pp. 785-81I. squaresestimator, 74-7; Kalman filter sensitivity system,
derivation of batch form. 7374 examples, 160-169 163,262-263
TREEEL, S., and E. A. ROBINSON.1966.The design of high-resolution digital filters. properties, 75-78 Kalman filter derivation, 151-153 Kalman filter tuning parameter, 167
IEEE Trans. on Geoscience and Electronics, Vol. GE-4, pp. 25-38. for random parameters, l2l-123 properties. 153--158 Kalman gain matrix, 151
recursive forms, 78-79, 167 recursive filter. 153
TUCKER,H. G. 1962. An Introduction to Probabiiity und Mathematics! Stdstics. NY: Biases (see Not-so-basic state- relationship to Wiener filtering, Large sample properties of estiina-
variable model) 176-181 tars (seealso Least-squareses-
Academic Press. BLUE (seeBest linear unbiased es- steady-state Kalman filter, 17C- timator):
TUCKER, FL G. 1967. A Gruduute Course in Probabi&y. NY: Academic Press. timator) 176 asymptotic efficient , 60
Finite-difference equation coeffi- asymptotic unbiasecyness, 57
VAE TREES,H L. 1968. Detection, Estimution und Modulufion Theory, Vol. I. NY: Causalinvertibility, 157-158 cient identification (see Identi- consistency. 57-60
Colored noises (see Not-so-basic fication of) Least-squares estimation process-
John Wiley. state-variable model) Fisher informatin matrix, 51 ing:
Condttionai mean: Fixed-interval smoother, 183-184, batch, 17-24
ZACKS, S. 1971. The Theory of statisticul Inference. NY: John Wiley. defined. 102 190.193-199 cross-sectional. 37-38
properties of, 104-306 Fixed-lag smoother, 184, 191, recursive, 2632.3-M2
Consistency (see Large sample 24X-202 Least-squares estimator:
properties of estimators) Fixed memory estimator. 23 derivation of batch form, 19-20
Convergencein mean-s uare, 59 Fiiebpgi;ml smoother, 184, 190, derivation of recursive covar-
Convergencein probabi1.lty, 58 iance form, 30-31
Correlated noises (see Not-so-basic derivation of recursive informa-
state-variable model) Gauss-Markov random processes tion form. 27-28
Covariance form of recursive least- defined. 128-129 exam les. 20-23, 3-5-36
squaresestimator, 31,79 Gaussian random processes (see initia Plzation of recursive forms,
Cramer-Rao inequality: Gauss-Markov random pro-
scalar parameter, 47 cesses) arz sample properties. 68-70
vector of parameters, 51 Gaussianrandom variables (seealso multistage. 38-4,
Cross-sectiona processing (see Conditional mean) properties. 63-70
Least-squaresestimation) conditional density function, 102- for random parameters. 121
small sample properties. 63-68
Deconvolution: joi?densitv function 101-102 for vector measurements,36
maximum-likelihood (MLD). multivariate density finction, 101 Lehmann-Scheffetheorem.297-293
124-125.215-216 properties of, 104 Likelihood:
minimum-variance (MVD). 121% univariate density function, 100 compared with probability, 81-82
121.X%21s Glossary of major results. 295-299 conditional, 1l4
model formulation. 12-14 continuous distributions. 85.88
Discretization of linear time- Hypotheses: defined. 82-84
varying state-variable model. binarv, 81 unconditional, 114
242-245 multiple, 85-86 Likelihood rato:
Divergence phenomenon. 167-169 defined for multipIe hypotheses,
S-86
Efficiency (see SmalI sample prop- Identifiability. 95-96 defined for two hypotheses.84-86
erties of estimators) Identification of: Linear model:
Estimate types, 14-15 (seealso Pre- coefficients of a finite-difference defined. 7
diction: Filtering; Smoothing) equation. 9.65 examples of. &I4 (seealso Iden-
Estimation of deterministic param- impulse response. 8-9. 3p-42. tification of; State estimation
eters (see Least-squares esti- 64-65 example: Dcconvolution; Non-
mation: Best linear unbiased initial condition vector in an un- ]incar mcasurcmfn~ rnrlrirll
Nonlinear dvnarnical systems: icasl squares. 2-&24
discrctized petturbalion state- Sensitivity of Kalman filter, 16-L
variable model, 24s 166
linear perturbation equaCor)s, Signal-to-noise ralio, 137-138. 167
23?-242 Smglc-channelsteady-state Kalrnan
model, 237-239 tiller* 17-s 176
Nonlinear measurement model. 12 Small sample Dropertics of estima-
Not-so-basic state-variable model: tars (5ee&o-Least-squares es-
hlarkov process. 129-130 biases, 224 timator\:
hMrix inversion lemma, 30 colored noises, 227-230 efficiency:&-S2
Matrix Riccaci differential equa- correlated noises, 22-5227 unbiasedness, 44-46
tion* 272 perfect measurements, 23&233 Smnothine:
>Iatrix Ricca!i equation+ 15s Notation, 14-15 applications. X%222
Maximum a posteriori estimation: double-stage. 187-189
comparison with best linear un- Orthogonality principle, 11l-l 12 fixed-internal, 190, 19-3-199
biased estimator, 12-%I24 fixed-lag, 191.201-202
comparison with mean-squared Parameter estimation (Jee Ex- fixed-point. 190, 199-201
cstimatnr. i l-%116 tended Kaiman filter) single-stage, 18-Gl87
Gaussianlinear model, 123-124 Perfect measuremen& (see Not-so- three types. 183-184
general case. 114-l 16 basic state-variable model) Stabilized fotrn for computing
~~aximurn-~~keli~~~~~d deconvolu- Perturbation equations (JC~ Han- I(k + l/A + I], IS6
tion (set DeconvoIution) linear dynamical systems) Standard form for computing
Maximum-likelihood estimation, Philosoph!: P(k + lik +- l),l%
w97 estimation theory, 6 State estimation (see Prediction;
Maximum-likelihood estimators: modeling, 5-6 Filte+ng:.Smoothing)
compatison with best linear un- Prediction: Statel;;tlmatlon example, 10-12,
biased estimator, 93-94 general, 142-145
comparison with least-squareses- recursive predictor, 155 State-variable model (see Basic
timator. 94 single-stage, 140-142 state-variable model; Not-so-
for exponential families, 287-290 steady-state predictor. 173 basic state-variable model)
the linear model, 92-94 Predictor-corrector form of Kalman Steady-state a roximation (5ee
obtaining ?hem+89-91 filter, 151-W Maximum- r!i elihood state and
properties. 91-92 Properties of best linear unbiased parameter estimation)
Maximum-likelihood method, $9 estimators [see Best linear un- Steady-statefilter system, 173
h$aximum-IikeIihood state and pa- biased estimator] Steadv-stateKalman filter, I?&172
rameier estimation: Propertie of estimators (see Small Steadi-state MVD filter:
computing &. 261 sample properties of estima- defined, 207
log-likelihood function for the tors: Large sample properties properties of, 212-213
basic state-variable model, of estimators) relationship m HR Wiener de-
259-261 Properties of least-squaresestima- convolution filter, 213-215
steady-state approximation, 264- tar (see Least-squares estima- Steady-statepredictor system, 173
268 tar) Stochastic linear optimal output
Mean-squared convergence (see Propetiies of maximum-likciihood feedback regulator problem,
Convergence in mean-square) estimators (see Maximum- 281
blean-squared estimation, 109-113 likelihood estimators] Sufficient statistics:
Mean-squared e5timators: Properties of mean-squaredestima- defined, 282-284
comparison with best linear un- tars (seeMean-squaredestima- for exponential families of distri-
biased estimator, 122-123 tors) buttons. 285286,287-290
comparison with maximum a pos- and uniformlv minimum-variance
tcriori estimator. ll-$116 Random processes (,qce Gauss- unbiased e&nation, 290-294
derivation. 110 Matkov random processes)
Gaussiancase. 11l-1 13 Random variables (XC Gaussian Wnbiasedness (set Small sample
for linear and Gaussian model, random variabIes) properties of estimators)
1l&l20 Recursive calculation of state co- Uniformly minimum-variance un-
properties of, 112-l 13 variance matrix, 134 biased estimation (see Suffi-
Measurement differencing tech- Recursive calculation of state mean cient statistics)
nique, 231 vector, 134
Minimum-variance deconvoiution Recursive processing (see Least-
squares estimation recessing; Variance estimator, 6748
(seeDeconvolution)
MLD (PC Deconvolution) Best linear unbiasecrestimator)
Mod& ng: Recursive wavesha ing, 21&222 Waveshapin (see Recursive wave-
estimation problem, 1 Reduced-order Ka rman filter, 231 shapingf
measurement problem. 1 Reduced-order state estimator. 231 Weighted least-squares estimator
re resent&on roblem, 1 Riccati equation (seeMatrix Riccati (see Least-square5 estimator;
vaYidation prob rcm, 2 equation: Algebraic Riccati Best linear unbiased estimator)
Multistage least-squares(XC Least- equation) White noise discrete, 130-131
Squaresestimator) Wiener fiIter:
Multivariate Gaussianrandom sari- Sarn;te;yn as a recursive digital derivation. 178-179
ables (-TeeGaussian random relation to Kalman filter, 18&181
variables) Scaiecha&es: Wiener-Hopf equations, 179
MVD (seeDeconvolution) best linear unbiased estimator,
77-78 &m-in algorithm. 3!H2