Professional Documents
Culture Documents
3-Recursive Parameter Estimation PDF
3-Recursive Parameter Estimation PDF
3-Recursive Parameter Estimation PDF
1. Introduction
2. The Structure of Recursive Estimators
3. System and Signal Models
4. Recursive Least Squares
5. Recursive Instrumental Variables
6. Recursive Extended Least Squares
7. Time Varying Systems
8. Potential Operating Problems
9. Application of Recursive Identification
10. Conclusions
11. Appendix: formulae derivations
12. References
§1. INTRODUCTION
In an off-line (or batch) mode an experiment is carried out and afterwards all the
data are processed simultaneously. The methods employed for off-line system
identification are thus based on information from the plant which has been obtained
previously. This usually means that statistical tests are applied to a set of plant input-
output data in order to make an estimation of the model order and subsequently of
the values of the parameters within a model of that particular order.
In an on-line (or recursive) mode the data are used as soon as they are available.
The parameters can therefore be continuously estimated during the experiment. The
parameter estimates are recalculated each time new data becomes available. Thus
when the model is updated periodically, with reference to its past values, this is called
recursive identification or recursive parameter estimation. It is employed not only
within control algorithms, but also for many signal processing and filtering
problems.
Recursive or on-line methods have become increasingly important. Over the years,
many identification methods have been proposed. For the newcomer to the field it is
hard to see how the various methods are related. Once-upon-a-time the field of
identification has been called a “fiddler’s paradise” (Åström and Eykhoff, 1971). It is
2
still often viewed as a long confusing list of methods and tricks. Coherence and
unification in the field of identification is not immediate. One reason is that methods
and algorithms have been developed in different areas with different applications in
mind.
The term “recursive identification” is taken from control literature. In statistical and
econometric literature the field is usually called “sequential parameter estimation”
and in signal processing and telecommunication the methods are known as
“adaptive filtering algorithms”.
Within these areas algorithms have been developed and analyzed over the last 30
years. However, only recently there has been a noticeably increased interest in the
field from practitioners and industrial users. This is due to the construction of more
complex systems, where adaptive techniques (adaptive control, adaptive signal
processing) may be useful or necessary, and of course to the availability of
microprocessors for the easy implementation of more advanced algorithms.
In this text the most important forms of recursive identification are discussed.
Emphasis is placed on implementation and application aspects rather than on
theoretical (convergence) analysis. Moreover no attempt is made to make the list of
algorithms comprehensive.
is given by a constant gain K, i.e. the process transforming the measured input u into
the measured output y has no dynamic elements.
Assuming that the data acquisition takes place in discrete time, as is normally the
case, we have at time t received a sequence of measurements
3
{u(1), y (1),u(2), y (2),...u(i ), y (i ),...u(t ), y (t )} . As indicated we are dealing with a static
∑ y ( i ) = K ∑ u( i ) + ∑ v ( i )
i =1 i =1 i =1
(4)
Now supposing that the average value for the random disturbances is zero, we have
t
∑ v (i ) ≈ 0
i =1
and a good estimate for K based on t pairs of measurement data
{y (⋅),u(⋅)} is given by
t
∑ y (i )
Kˆ t = i =1
t
(5)
∑ u( i )
i =1
4
- store all measured values {y (⋅), u(⋅)} in a computer memory
- afterwards use the data to identify the unknown process gain K by means of
formula (5)
- measure y (1), u(1) , compute Kˆ 1,α1 by means of (6)&(7) and forget about
y (1), u(1)
y (t ), u(t ) .
(indicating that the estimate Kˆ t −1 is indeed not a good estimate for the process). It is
adapted weakly if the prediction error is small (indicating that Kˆ t −1 is already a good
estimate and so it does not need to be changed much). This structure is typical for
recursive identification algorithms.
5
The estimator gain 1/ α (t ) is computed by means of a second recursion equation (7).
Other forms are possible for this gain updating formula, leading to algorithms with
different characteristics.
Linearization of the process model is a generally accepted procedure and thus has
been the basis of most algorithms. A typical single input – single output (SISO) model
is a linear difference equation.
Consider a dynamical system with input signal {u(t )} and output signal {y (t )} .
Suppose that these signals are sampled in discrete time t=1, 2, 3,… and that the
sampled values can be related through the linear difference equation:
y (t ) + a1y (t − 1) + ... + ana y (t − na ) = b1u (t − 1) + ... + bnb u (t − nb ) + v (t ) (8)
The model (8) or (10) describes the dynamic relationship between the input and the
output signals. It is expressed in terms of the parameter vector
θT = ⎡⎣a1...ana ; b1...bnb ⎤⎦ (11a)
We shall frequently express the relation (8) or (10) in terms of the parameter vector.
Introduce the vector of lagged input-output data,
φT (t ) = [ − y (t − 1)... − y (t − na ); u(t − 1)...u(t − nb )] (11b)
6
Then (8) can be rewritten as:
y (t ) = φT (t )θ + v (t ) (12)
This model describes the observed variable y(t) as an unknown linear combination of
the components of the observed vector φ(t ) plus noise. Such a model is called a
linear regression in statistics and is a very common type of model. The components
of φ(t ) are called regression variables or regressors. In control systems, φ(t ) is
also called the measurement vector and θ the parameter vector.
If the character of the disturbance term v(t) is not specified, we can think of
yˆ (t ) = φT (t ) ⋅ θ (13)
as a natural guess or “prediction” of what y(t) is going to be, having observed
previous values of y(k), u(k), k=t-1,t-2,… This guess depends of course on the model
parameters θ . The expression (13) becomes a prediction in the exact statistical
sense, if {v (t )} in (12) is a sequence of uncorrelated random variables with zero
mean values. We shall use the term “white noise” for such a sequence and denote it
by {e(t )} .
An important feature of the set of models discussed until now is that the “prediction”
yˆ (t ) in (13) is linear in the parameter vector θ . This makes the estimation of θ
simple.
Since the disturbance term v(t) in the model (12) corresponds to an “equation error”
in the difference equation (8), methods to estimate θ in (12) are often known as
equation error methods.
7
We could add flexibility to the model (10) by also modeling the disturbance term v(t).
Suppose that this can be described as a moving average (MA) of a white noise
sequence {e(t )} :
v (t ) = C(q −1 )e(t )
C(q −1 ) = 1 + c1q −1 + ... + cnc q − nc
This is known as an ARMAX model. The reason for this term is that the model is a
combination of an autoregressive (AR) part A(q −1 )y (t ) , a moving average (MA) part
C (q −1 )e(t ) , and a control part B(q −1 )u(t ) . The control signal is in the econometric
literature known as the eXogenous variable, hence the X. In control literature also the
name CARMA model is used (Controlled AutoRegressive Moving Average).
The dynamics of the model (15) are expressed in terms of the parameter vector:
θT = ⎡⎣a1...ana ; b1...bnb ; c1...cnc ⎤⎦
Since the model (15) also provides us with a statistical description of the
disturbances, we can compute a properly defined prediction of the output y(t).
When no input is present, the use of (15) means that we are describing the signal
{y (t )} as an ARMA process. This is a very common type of model for stochastic
signals.
Notice that the models are used to describe a stochastic dynamical system with an
input {u(t )} and an output {y (t )} , as well as to describe the properties of a stochastic
8
§4. RECURSIVE LEAST SQUARES (RLS)
Many methods are possible for off-line system identification, where a finite amount of
plant data must be obtained and subsequently employed to obtain system parameter
estimates. An obvious approach to recursive identification is to take any off-line
method and modify it, so that it meets the constraints
which is a generalization of the structure (6,7). Here f (.,.,.) and g(.,.,.) are known
functions of the previous estimate θ̂ , current data φ and the auxiliary variable P .
The only thing that needs to be stored at time t, consequently is the
{ }
information θˆ (t ),P(t ) . This quantity is updated with a fixed algorithm, with a number
of operations that does not depend on time t. The choice of the functions f and g in
(16,17) leads to several recursive identification methods. In this text the intention is to
take a brief look at some commonly encountered algorithms.
The first technique to be discussed is in fact that of Recursive Least Squares (RLS)
which is popular not only because of its relatively low computational requirements but
also because it is straightforward to understand.
Consider the difference equation model (12). At time instant t-1, we actually know not
only θˆ (t − 1) but also φT (t ) = [ − y (t − 1),...; u(t − 1),...] . With regard to (12) as the system
equation, a guess can therefore be made as to what the next output signal, yˆ (t ) , will
be, i.e.
yˆ (t ) = φT (t ) ⋅ θˆ (t − 1) (18)
As the character of the disturbance v(t) in (12) is not specified, the best we can do as
a first step is to forget about it in the prediction (as its average is zero).
Once the new output signal is measured, the error in prediction can be found as
ε (t ) = y (t ) − yˆ (t ) (19)
9
It is then fairly intuitive that when the noise signal v(t) is relatively small, if our
parameter estimates θ̂ are fairly close to their actual values θ then the error ε (t )
should also be small; if however our estimates θ̂ are a pretty poor approximation to
θ then one would expect ε (t ) to be large. By taking into account the magnitude of
the error ε (t ) it is possible to improve the parameter estimates by means of the
equation:
θˆ (t ) = θˆ (t − 1) + K(t )ε (t ) (20a)
such that for any particular K(t ) , if ε (t ) is small, very little change is made to our
estimates whereas for a large ε (t ) a lot of alteration is required (Prediction Error
Identification Method).
It is now apparent that the choice of K(t ) is important, e.g. K(t ) = 0 for all t is not
perhaps ideal!!! – it does mean though that another choice for K(t ) is better; so how
do we find a better choice and is it possible to calculate a ‘best’ choice?
Initially let us investigate the difference between the actual and estimated parameter
vectors,
∆(t ) = θ(t ) − θˆ (t ) (21)
It is straightforward to show that
∆(t ) = ⎡⎣I − K(t )φT (t )⎤⎦ ∆(t − 1) − K(t )v (t ) (22)
where I is the identity matrix. Equation (22) then shows that if rapid updating of the
parameter estimates is required then K(t ) must be large, although this will result in
large perturbations due to the K(t )v (t ) term. Conversely if K(t ) is chosen to be small,
in order to achieve better noise rejection, then the updating of the estimates will also
be sluggish.
A sensible choice for K(t ) is one which minimizes the sum of the squared error terms
10
1 −1
t
{
P (t ) = R(t ) = Ε φ(t )φT (t ) } (23)
where Ε {...} signifies the stochastic expected value, it is shown in the appendix that
the recursively calculated least squares estimate is found if
P(t − 1)φ(t )
K(t ) = (24a)
1 + φT (t )P(t − 1)φ(t )
= P(t )φ(t ) (24b)
The RLS parameter estimator consists of three equations, namely (20), (24) and (25)
and these are all recalculated at each time instant.
{ }
mean white noise sequence, then the estimate sequence θˆ (t ) will converge to
{ }
colored noise then the sequence θˆ (t ) will converge, but will be biased away
from θ .
9 The equations (24) and (25) may result in divergence due to numerical problems
alone if care is not taken. This problem of numerical instability might occur after
several ten-thousands of samples. In such case it is suggested that a numerically
stable algorithm such as UD factorization be used in order to avoid this possibility
(Bierman, 1977).
9 Some initial values must be selected in order to get the recursive estimator under
way, i.e., at time t=0 values must be given to the parameter estimates θˆ (0) and
the covariance matrix P(0) . The latter of these gives an indication of our
uncertainty in terms of the estimated parameter values: the covariance matrix of
{ }
the estimates is given by σ v2P(t ) where σ v2 = Ε v 2 (t ) , the disturbance variance.
11
Relatively high values, e.g. P(0) = 1000I , are normal practice; such choice points
to little confidence in our initial parameter estimates θˆ (0) and causes the first few
recursions of the estimator to fluctuate wildly before steadier estimates are
obtained.
Whilst the data vector φ(t ) and the disturbance v(t) are uncorrelated, so the RLS
technique can result in parameter estimates which converge to their true value.
However when φ(t ) and v(t) are correlated, so the expectation is that the estimates
will converge to a parameter set which is biased away from their true values.
Instrumental Variables employs the prediction error term ε (t ) obtained from (18,19):
ε (t ) = y (t ) − φT (t )θˆ (t − 1)
This error is then used in the standard estimate update equation (20), i.e.
θˆ (t ) = θˆ (t − 1) + K(t )ε (t )
in a similar fashion to the method of RLS. However K(t ) is obtained from the
equation (appendix, equation A14b)
P(t − 1)z(t )
K(t ) = (26)
1 + φT (t )P(t − 1)z(t )
in which z(t ) is the pseudo-data vector defined as
12
P(t − 1)z(t )φT (t )P(t − 1)
= P(t − 1) − (29b)
1 + φT (t )P(t − 1)z(t )
In summary then, as far as implementation is concerned, there is no difference
between RLS and RIV except for the calculation of gain K(t ) which, in the case of
RIV is obtained by means of the pseudo-data vector z(t ) found in terms of the model
output y m (t ) .
In the section on ‘System and Signal Models’ it was already indicated that, if it is
known a priori that the disturbance v(.) is coloured instead of white noise, we could
add flexibility to the model (10) by also modelling the disturbance term v(t). This led
to the ARMAX model (15) (with e(.) being white noise):
y (t ) + a1y (t − 1) + ... + ana y (t − na ) =
(30)
b1u(t − 1) + ... + bnb u(t − nb ) + e(t ) + c1e(t − 1) + ... + cnc e(t − nc )
This model looks just like the linear regression (12), and we can try to apply the
recursive least squares algorithm (20, 24, 25) to it for estimating θ̂ :
Notice that this would lead to unbiased parameter estimates as the disturbance e(t)
in (31) is white noise. The problem is, of course, that the variables e(.) entering the ξ
13
vector are not measurable, and hence (32) cannot be implemented as it stands. We
have to replace the components e(.) with some estimate of them. From (30) we have
e(t ) = y (t ) + a1y (t − 1) + ... + ana y (t − na ) − b1u(t − 1) − ...
− bnb u(t − nb ) − c1e(t − 1) − ... − cnc e(t − nc )
With
φT (t ) = ⎣⎡ − y (t − 1)... − y (t − na ); u(t − 1)...u(t − nb ); eˆ(t − 1)...eˆ(t − nc ⎦⎤ (34)
An obvious algorithm for estimating θ̂ is now obtained from (32) by replacing ξ(t ) by
φ(t ) , computed according to (34,35). This gives the recursive extended least
squares (RELS) algorithm.
Throughout the discussion on recursive techniques this far it has been assumed that
a vector of parameter estimates will converge, under certain conditions, to a vector
which consists of the true values. The underlying implication in this is that the
parameters within the true vector will remain where they are in order to be converged
upon. However, in many practical situations the system under consideration will be
affected by ageing, modifications and unmodelled ambient conditions or unmodelled
14
dynamics. Each of these can cause the ‘actual’ or ‘true’ system parameters to vary
with respect to time. Usually, parameter variations will simply be in terms of a steady
drift; although, where modifications are made or when faults occur, a rapid alteration
can occur. The result of this is that if a recursive parameter estimator is required to
have an up-to-date picture of the system characteristics then it must be able to track
any system parameter variations.
In this section the method of RLS is reconsidered in order to show how it can be
modified to cope with time-varying systems. Similar modifications can however be
made to the other algorithms.
The most straightforward way of dealing with time-varying systems is based on the
reasoning that when the system itself is time varying, information from some time
earlier will not be as representative of the system as the data just obtained, the
earlier information being based on what the system was like in the past, rather than
what it is now like.
A common modification of the original RLS method is thus to weight new data more
heavily than old data. This can be done by including an exponential weighting factor
(called forgetting factor) in the performance index (appendix equation A2):
t 2
V [θ ] = ∑ λ t −k ⎡⎣ y (k ) − φT (k )θ ⎤⎦ (36)
k =1
where λ is the exponential weighting factor, 0 < λ ≤ 1 . When λ = 1 , all data are
weighted equally. For 0 λ < 1 , more weight is placed on recent measurements than
on older measurements. Following the derivation shown in the appendix for the
original algorithm, the performance index given by (36) results in the following
recursive least squares algorithm:
It can be seen from (37b) that the effect of the exponential weighting factor (λ, which
is <1) is to prevent the elements of P from becoming too small. This maintains the
15
sensitivity of the algorithm and allows new data to continue to affect the parameter
estimates. On the other hand, when y and u are close to zero, then P(t − 1)φ(t ) → 0 ,
and P(t ) ≈ P(t − 1) / λ . Hence P grows exponentially until φ changes. Equation (37a)
shows how bursts in θˆ (t ) can occur for large P, especially when a perturbation
signal is introduced. This phenomenon is known as estimator windup or
covariance windup.
When very little excitation of the process signals occurs, as discussed earlier, small
model errors can lead to large parameter changes (see (37a)).
First, if λ<1, the algorithm is more sensitive to noise, as well as parameter changes,
which causes the parameter estimates to drift erroneously. The quality of the
estimates can be improved if a perturbation signal is added to the process input.
16
The second disadvantage is that with λ<1, the elements of P may become
excessively large with time. This in turn causes the algorithm to become overly
sensitive to parameter changes and noise, resulting in large fluctuations and drifting
in the parameter estimates.
It is apparent that simply selecting a constant value for λ will yield unsatisfactory
performance for one reason or another. The use of criteria to adapt the value of the
forgetting factor according to the current situation is a must for successful
applications.
To give a better feeling for the role recursive identification plays in applications we
shall consider some problems from different areas.
Now the steering dynamics of a ship depends on a number of things. The ship’s
shape and size, its loading and trim, as well as the water depth, are important factors.
Some of these may vary (loading, water depth) during a journey. Obviously, the wind
and wave disturbances that affect the steering may also rapidly change. Therefore
the regulator must be constantly retuned to match the current dynamics of the
17
system; in fact, it is desirable that the regulator retunes itself. This can be done by
estimating the ship parameters by means of a recursive parameter estimator.
Many control problems exhibit features similar to the foregoing example. Airplanes,
missiles and automobiles have dynamic properties that depend on speed, loading,
etc. The dynamic properties of electric-motor drives change with the load. Machinery
such as that in paper-making plants is affected by many factors that change in an
unpredictable manner.
Chemical process control is another major field of application. The area of adaptive
control is concerned with the study and design of controllers and regulators that
adjust to varying properties of the controlled object. This is currently a very active
research area. A specific technique (EPSAC) is described in a separate text (De
Keyser, 2003).
Now, prediction of the power demand of course requires some sort of a model of its
random component. It seems reasonable to suppose that the mechanism that
generates this random contribution to the power load depends on circumstances e.g.,
the weather, which themselves may vary with time. Therefore it would be desirable to
use a predictor that adapts itself to changing properties of the signal to be predicted.
The foregoing is an example of adaptive prediction; it has been found that adaptive
prediction can be applied to a wide variety of problems. The operator guide
application described in a separate paper is another example (De Keyser & Van
Cauwenberghe, 1982).
18
Example 3 (Digital Transmission of Speech)
Consider the transmission of speech over a communication channel. This is now
more often done digitally, which means that the analog speech signal is quantized to
a number of bits, which are transmitted. The transmission line has limited capacity,
and it is important to use it as efficiently as possible. If one predicts the “next sample
value” of the signal both at the transmitter and at the receiver, one need transmit only
the difference between the actual and the predicted value (the “prediction error”).
Since the prediction error is typically much smaller than the signal itself, it requires
fewer bits when transmitted; hence the line is more efficiently used. This technique is
known as predictive coding in communication theory. Now the prediction of the next
value very much depends on the character of the transmitted signal. In the case of
speech, this character significantly varies with the different sounds (phonemes) being
pronounced. Efficient use of the predictive encoding procedure therefore requires
that the predictor is based on real-time recursive identification of the signal
characteristic parameters.
19
Example 5 (Monitoring and Failure Detection)
Many systems must be constantly monitored to detect possible failures, or to decide
when a repair or replacement must be made. Such monitoring can sometimes be
done by manual interference. However, in complex highly automated systems with
stringent safety requirements, the monitoring itself must be computerized. This
means that measured signals from the systems must be processed to infer the
current (dynamic) properties of the system: based on this data, it is then decided
whether the system has undergone critical or undesired changes. The procedure
must of course be applied on-line so that any decision is not unnecessarily delayed.
§10. CONCLUSIONS
Based on the results to date, there are several conclusions that can be drawn about
the features of a successful estimation scheme:
1. The Recursive Least Squares (RLS) method is the most popular estimation
technique and appears to exhibit rapid convergence when properly applied.
Recursive Extended Least Squares (RELS) using pseudolinear regression
seems to be a satisfactory way to treat the non-white noise case, although the
parameters in the C-polynomial do not always need to be estimated. In the
latter situation Recursive Instrumental Variables (RIV) might be a good
alternative.
20
§11. APPENDIX: Formulae Derivations
numbers. The inclusion of the coefficients α t in the criterion (A2) allows us to give
provided the inverse exists. This is the celebrated least squares estimate. For our
current purposes it is important to note that the expression (A3) can be rewritten in a
recursive fashion. To prove this, we proceed as follows. Denote
t
S(t ) = ∑ α k φ(k )φT (k )
k =1
Hence
21
⎡ t −1 ⎤
θˆ (t ) = S−1(t ) ⎢ ∑ α k φ(k )y (k ) + α t φ/ (t )y (t )⎥
⎣ k =1 ⎦
= S−1(t ) ⎡S(t − 1)θˆ (t − 1) + α t φ(t )y (t )⎤
⎣ ⎦
{
= S−1(t ) S(t )θˆ (t − 1) + α t φ(t ) ⎡ −φT (t )θˆ (t − 1) + y (t )⎤
⎣ ⎦ }
θˆ (t ) = θˆ (t − 1) + S−1(t )φ(t )α t ⎡⎣ y (t ) − φT (t )θˆ (t − 1)⎤⎦ (A4a)
and
S(t ) = S(t − 1) + α t φ(t )φT (t ) (A4b).
The algorithm (A4) is not, however, well suited for computation as it stands, since a
matrix has to be inverted in each time step. It is more natural to introduce
P(t ) = S −1(t )
and update P(t ) directly, instead of using (A4b). This is accomplished by the so-called
matrix inversion lemma, which we now state.
[ A + BCD]
-1 -1
= A -1 - A -1B ⎡⎣DA -1B + C-1 ⎤⎦ DA -1 (A5)
Proof: Multiply the right-hand side of (A5) by A+BCD from the right. This gives
-1 -1
I + A -1BCD - A -1B ⎡⎣DA -1B + C-1 ⎤⎦ D - A -1B ⎡⎣DA -1B + C-1 ⎤⎦ DA -1BCD =
{⎡⎣DA B + C }
-1
I + A -1B ⎡⎣DA -1B + C-1 ⎤⎦ -1 -1
⎤⎦ CD - D - DA -1BCD =
{O} = I
-1
I + A -1B ⎡⎣DA -1B + C-1 ⎤⎦
gives
−1
P(t ) = ⎡⎣P−1(t − 1) + φ(t )α t φT (t )⎤⎦
−1
⎡ 1⎤
= P(t − 1) − P(t − 1)φ(t ) ⎢φT (t )P(t − 1)φ(t ) + ⎥ φT (t )P(t − 1)
⎣ αt ⎦
22
P(t − 1)φ(t )φT (t )P(t − 1)
P(t ) = P(t − 1) − (A6)
1
+ φT (t )P(t − 1)φ(t )
αt
The advantages of (A6) over (A4b) are obvious. The inversion of a square matrix is
replaced by inversion of a scalar. From (A6) we also find that
α t P(t − 1)φ(t )φT (t )P(t − 1)φ(t ) P(t − 1)φ(t )
α t P(t )φ(t ) = α t P(t − 1)φ(t ) − = (A7)
1 1
+ φ (t )P(t − 1)φ(t )
T
+ φ (t )P(t − 1)φ(t )
T
αt αt
Thus the least squares estimate θˆ (t ) defined by (A3) can be recursively calculated
by means of
P(t − 1)φ(t )
K(t ) = (A8b)
1
+ φT (t )P(t − 1)φ(t )
αt
- Initial Conditions
Any recursive algorithm requires some initial value to be started up. In (A8) we
need θˆ (0) and P(0) . Since we derived (A8) from (A3) under the assumption
that S(t ) is invertible, an exact relationship between these two expressions can
hold only if (A8) is initialized at a time t0 when S(t0 ) is invertible. Typically,
S(t ) becomes invertible at time t0 = dim φ(t ) = dim θ(t ) . Thus, strictly speaking,
the proper initial values for (A8) are obtained if we start the recursion at time
t0 , for which
23
−1
⎡ t0 ⎤
P(t0 ) = ⎢ ∑ α k φ(k )φT (k )⎥
⎣ k =1 ⎦
t0
θˆ (t0 ) = P(t0 )∑ α k φ(k )y (k )
k =1
It is more common, though, to start the recursion at t=0 with some invertible
matrix P(0) and a vector θˆ (0) . The estimates resulting from (A8) are then
−1
⎡ t
⎤ ⎡ t
⎤
θˆ (t ) = ⎢P−1(0) + ∑ α k φ(k )φT (k )⎥ ⎢P−1(0)θˆ (0) + ∑ α k φ(k )y (k )⎥ (A9)
⎣ k =1 ⎦ ⎣ k =1 ⎦
This can be seen by verifying that (A9) obeys the recursion (A8) with these
initial conditions.
By comparing (A9) to (A3), we see that the relative importance of the initial
values decays with time, as the magnitudes of the sums increase. Also, as
P −1(0) → 0 the recursive estimate approaches the offline estimate. Therefore,
- Asymptotic Properties
To investigate how the estimate (A3) behaves when N becomes large, we
assume that the data actually have been generated by
y (t ) = φT (t )θ 0 + v (t ) (A10)
Inserting this expression for y(t) into (A3) gives
−1
⎡N ⎤ ⎧N ⎫
θˆ (N ) = ⎢ ∑ α t φ(t )φT (t )⎥ ⎨∑ α t ⎡⎣φ(t )φT (t )θ 0 + φ(t )v (t )⎤⎦ ⎬
⎣ t =1 ⎦ ⎩ t =1 ⎭
−1
(A11)
⎡1 N ⎤ 1 N
= θ0 + ⎢ ∑ α t φ(t )φT (t )⎥ ∑ α t φ(t )v (t )
⎣ N t =1 ⎦ N t =1
24
1 N
∑ α t φ(t )v (t ) will, under weak conditions, converge to its expected value as
N t =1
N → ∞ , according to the law of large numbers. This expected value depends
on the correlation between the disturbance term v(t) and the data vector φ(t ) .
correlated with v(t). This means that we may expect θˆ (N ) not to tend to θ 0 as
conditions:
• z(t ) and v(t) are uncorrelated
• v(t) has zero mean
1 N
• the matrix lim
N →∞ N
∑
t =1
z(t )φT (t ) is invertible.
25
The estimate (A12) is known as the instrumental variable (IV) estimate. The vectors
z(t ) are referred to as the instrumental variables.
It is obvious that the estimate (A12) can be rewritten in a recursive fashion, just as
the least squares estimate in (A8). We then find that
P(t − 1)z(t )
K(t ) = (A14b)
1 + φT (t )P(t − 1)z(t )
We have not yet discussed the choice of the instrumental variables z(t ) . Loosely
speaking, they should be sufficiently correlated with φ(t ) to ensure the invertibility
condition, but uncorrelated with the system noise terms. A common choice is
z(t ) = [ − y m (t − 1)... − y m (t − na ); u(t − 1)...u(t − nb )]
where y m (t ) is the output of a deterministic system driven by the actual input u(t):
For the recursive algorithm (A14) an often used approach is to let ai and bi be time-
dependent. Then the current estimates aˆi (t ) , bˆi (t ) obtained from (A14) can be used
at time t in (A15). That is, we can write:
y m (t ) = zT (t )θˆ (t ) (A16)
26
§12. REFERENCES:
27