Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Automatica,Vol.29.No.3,pp. 649-660,1993 0005-1098193I6.00+0.

00
PrintedinGreatBritain. ~ 1993PergamonPressLtd

Subspace Algorithms for the Stochastic


Identification Problem* f
P E T E R V A N OVERSCHEE:~§II and B A R T D E M O O R , ¶

A n e w subspace algorithm consistently identifies stochastic state space


m o d e l s directly f r o m given o u t p u t data, using only semi-infinite b l o c k
H a n k e l matrices.

Key Words--System identification; stochastic systems; stochastic approximation; Kalman filters;


difference equations; OR and quotient singular value decomposition.

Ala~raet--In this paper, we derive a new subspace algorithm where dlkp is the Kronecker delta. It is assumed
to consistently identify stochastic state space models from that the stochastic process is zero mean
given output data without forming the covariance matrix and
using only semi-infinite block Hankel matrices. The stationary, i.e.: l~.[xk] = 0 and E[XkXtk] = ~ (say).
algorithm is based on the concept of principal angles and The state covariance matrix X is independent of
directions. We describe how they can be calculated with QR the time k. This implies that A is a stable matrix
and Quotient Singular Value Decomposition. We also
provide an interpretation of the principal directions as states (all of its poles are strictly inside the unit circle).
of a non-steady state Kalman filter bank. The central problem discussed in this paper is
1. INTRODUCTION the identification of a state space model from the
data Yk (including the determination of the
LET Yk • 9~t, k = 0, 1. . . . , K be a data sequence
system order n) and the determination of the
that is generated by the following system:
noise covariance matrices. The state space model
Xk+ 1 = A x k + Wk, (1) should be equivalent up to within second order
statistics of the output.
Yk = CXk + Ok, (2)
The main contributions of this paper are the
where Xk • ~ " is a state vector. The vector following:
sequence Wk • 9~" is a process noise while the • Since the pioneering papers by Akaike (1975),
vector sequence Vk • 9t t is a measurement noise. canonical correlations (which were first intro-
They are both assumed to be zero mean, white, duced by Jordan (1875) in linear algebra and
Gaussian with covariance matrix** then by Hotelling (1936) in the statistical
community) have been used as a mathematical
tool in the stochastic realization problem. In
this paper we show how the approach by
Akaike (1975) and others (e.g. A r u n and
* Received 2 January 1992; received in final form 12 Kung, 1990; Larimore, 1990) reduces to
September 1992. The original version of this paper was not applying canonical correlation analysis to two
presented at any IFAC meeting. This paper was recom-
mended for publication in revised form by Associate Editor matrices that are double infinite (i.e. have an
B. Wahlberg under the direction of Editor P. C. Parks. infinite number of rows and columns). A
t The research reported in this paper was partially careful analysis reveals the nature of this
supported by the Belgian Program on Inter University
Attraction Poles initiated by the Belgian State Science Policy double infinity and we manage to reduce the
Programming (Prime Minister's Office) and the European canonical correlation approach to a semi-
Community Research Program ESPRIT, Basic Research infinite matrix problem, i.e. only the number
Action nr. 3280. The scientific responsibility is assumed by its
authors. of columns needs to be very large while the
* Katholieke Universiteit Leuven, Department of Electri- number of (block) rows remains sufficiently
cal Engineering--ESAT. Kardinaal Mercierlaan 94, 3001 small. This observation is extremely relevant
Heverlee, Belgium. email: vanoverschee
@esat.kuleuven.ac.be Fax: 32/16/221855. with respect to the use of updating techniques.
§ Research Assistant of the Belgian National Fund for • In order to find the state space model, we
Scientific Research. derive a finite-dimensional vector sequence
IIAuthor to whom all correspondence should be sent.
¶ Research Associate of the Belgian National Fund for (principal directions) which, in the case of
Scientific Research. double infinite block Hankel matrices, would
** The expected value operator is denoted by E[.]. be a valid state sequence of the stochastic
649
650 P. VAN OVERSCHEE and B. DE MOOR

model. This sequence would correspond to the (QSVD) of the triangular factors and is
outputs of an infinite number of steady state completely data driven instead of covariance
Kalman filters with an infinite number of driven. The fact that only one of the
output measurements as inputs. For the dimensions of the matrices involved needs to
semi-infinite matrix problem, the sequence go to infinity is very important since then
corresponds to the output of an infinite updating techniques can be used.
number of non-steady state Kalman filters that We should mention the work by Aoki (1987),
have only used a finite number of output data in which a "covariance" framework for the
as input. These state sequences are obtained stochastic realization is derived (in contrast to
directly from the output data, without any our square root version). Another aspect is that we
need for the state space model. The state emphasize more than is done in Aoki (1987), is
space model is then derived from these the careful analysis of the dynamics of the
sequences by solving a least squares problem. algorithm as a function of the number of block
Figure 1 illustrates the difference between this rows of the Hankel matrices involved. This
approach and the classical one, where first the paper is organized as follows. In Section 2, we
system matrices are identified, where after the summarize the main properties of linear
states are determined through a Kalman filter. time-invariant stochastic processes, including the
In our point of view, the notion of a state is non-uniqueness of the state space description. In
largely underestimated in the context of Section 3, we briefly present the solution via the
system identification, while it is well accepted classical realization approach, which is based on
in the context of control system design. Most the fact that the output covariance matrices can
identification approaches are based on an be considered as Markov parameters of a
input-output (transfer matrix) framework. deterministic linear time-invariant discrete time
• We give a precise interpretation of the fact system. In Section 4, we describe how the
that state space models, obtained via principal projection of past and future block Hankel
angles and directions, are approximately matrices of output data, leads to the definition of
balanced in the stochastic sense. the state sequences. In Section 5 we explore the
• It is a common belief that A R M A model relationship of these state sequences with the
identification requires non-linear estimation non-steady state Kalman filter. From these
methods, requiring iterative solution tech- sequences, it is easy to derive the stochastic state
niques (see e.g. Ljung, 1987). In that space model. This will be done in Section 6. The
framework, the main problems arise from second part of this paper consists of a description
convergence difficulties a n d / o r the existence of a numerical robust algorithm to calculate the
of local minima. With the subspace approach state sequences and the state space matrices. In
of this paper, this A R M A modelling is Section 7 we provide a geometrical inter-
basically reduced to the solution of an pretation of the main mathematical-geometrical
eigenvalue problem, for which numerically tool: principal angles and directions. We will
robust and always convergent algorithms exist. also show how the state sequences can be
• We derive a numerically robust square root calculated as principal directions. In Section 8,
algorithm, that mainly uses QR-decomposition we describe a numerical robust computation
and Quotient Singular Value Decomposition scheme based on the Q R and quotient singular
value decomposition. We also put everything
together and arrive at a new, numerically robust
algorithm to calculate the state space model of
I Output data Yk ] the stochastic system directly from output data.
Principal angles Classical Some illustrative examples are given in Section
and directions Identification 9.

1 1 2. LINEAR TIME-INVARIANT STOCHASTIC


PROCESSES
In this section, we summarize the main
Least Kalman 1 properties of linear time invariant stochastic
Squares filter
processes, including the non-uniqueness of a
I System matrices Kalman states state space description.
First we will develop some (well-known)
FIG. 1. The left hand side shows the new approach: first the structural relations for linear time-invariant
Kalman states, then the system matrices. The right hand side
is the classical approach: first the system, and then an stochastic processes. Since wk and Vk are zero
estimate of the states. mean white noise vector sequences, independent
Subspace algorithms for stochastic identification 651

of Xk, we know that: E[XkV~] = 0 and E[x,w~] = with Wbk= KbVbk and v~ = Y k - Gtrk • H e r e Kb is
O. Then we find the Lyapunov equation for the the backward Kalman gain: Kb=(C t-
state covariance matrix Y ~fE[xkx~,] = AY-,A t + AW-1G)(Ao - G W - I G ) -1 and N is the back-
Q. Defining the output covariance matrices ward state covariance matrix, which can be
A def ~ r t determined from the backward Riccati equation:
lxi = ~[Yk+iYk] we find for Ao:
Ao = C E C ' + R. (3) N=A'NA +(Ct-AtNG)

Defining × ( A o - G t N G ) - ~ ( C ' - A W G ) t. (10)


G ~ f A Z C t + S, (4) It is well known that the stochastic model for the
we get data y~ is not unique. Not only can we introduce
A i = CAi-IG, (5) an arbitrary similarity transformation for the
state (Xk---> TXk) as with all state space models.
A_ i = at(mi-1)tC t i = 1, 2 . . . . (6) In addition, there is a whole set of possible state
The model (1)-(2) can be converted into a covariance matrices and covariance matrices Q,
so-called forward innovation model and a R and S that give the same second order output
backward innovation model (see e.g. Pal, 1982). covariance matrices. This fact was described in
The forward innovation model is obtained by detail by Faurre (1976) and we provide here only
applying a Kalman filter to the stochastic system a summary of some interesting results.
(1)-(2): Zk+l = AZk + Wfk, Yk = Czk + l)fk, where The set of stochastic models that generate the
the forward noise sequences Wrk and Vrk are given output covariance matrices Ai is characterized as
by w{ = KyVrk and ~ = y , - CZk. Here Kr is the follows.
Kalman gain: K r = ( G - A P C t ) ( A o - C P C ' ) -1 (1) The matrices A, C and G are unique up to
and P is the forward state covariance matrix, within a similarity transformation.
which can be determined from the forward (2) The set of all state covariance matrices ~ is
Riccati equation: closed and bounded: P - < E - - - N -~ where P
and N are the solutions of the forward (7),
P = A P A ' + ( G - A P C t) respectively backward (10) Riccati equa-
× (Ao- CPCt)-I(G -APC')'. (7) tions. (Inequalities are to be interpreted in
the sense of nonnegative definiteness.)
Associated with every forward model (1)-(2),
(3) For every state covariance matrix Z
there is a backward model that can be obtained
satisfying these bounds, the associated
as follows. Define the minimum variance
matrices R, S and Q follow from the
estimate of Xk based on xk÷l as (Papoulis, 1984):
equations Q = X A ~ t, S = G - A Z C t and
-

H(xk [ Xk+l) = E[XkXtk+l](E[Xk+lXtk+l])-XXk+l.


R = Ao - C Z C t.
We now have:
x , = II(x~ I x,+,) + (xk - FI(Xk I Xk+l)) Definition 1. A stochastic model will be labeled
t t --1 as balanced if the solutions to the forward and
= E[XkXk+I](E[Xk+IXk+I] ) Xk+l
backward Riccati equations are diagonal and
+ - n ( x , I x +0) equal.
= E[XkXtk+l]~.--lXk+l + (X k -- I'I(Xk I Xk+l))
= E[x,(xtkA t + W'k)lZ-'Xk+l 3. THE CLASSICAL REALIZATION APPROACH
In this section, we briefly present the classical
+ (x, - n(x I
realization approach, which is based on the fact
= Z A t Z - I x * + I + (xk - I-I(xk I Xk+,)). that the output covariances can be considered as
NOW define the backward state st = E-~Xk, then Markov parameters of a linear time invariant
system.
Sk = AtSk+l + Wk, (8) In what follows, output block Hankel matrices
where wk = E - l ( X k -- H(Xk I Xk+l)). For the out- of the form
put equation, we obtain in a similar way: Yo Yl Yz """ Yj-2 Yj-I
Yk = G'Sk + vk, (9) Yl Y2 )'3 "'" Yj-I Yj
with Vk = (Yk -- H ( y k [ Xk+l))- Foil_ Y2 Y3 Y4 """ Yi Yj+l
Associated with the general backward model
(8)-(9) is the backward innovations model:
Y/-I Yi Yi+~ "'" Y~+j-3 Y~+j-2
rk_ 1 = Atrk + W~,
will play a crucial role. The first subscript
Yk = G~rk + vbk, denotes the time index of the upper left element
652 P. VAN OVERSCHEE and B. DE MOOR

while the second subscript is the time index of matrix (11). The identification scheme is based
the bottom left element. For all output block on geometrical insights and in contrast with
Hankel matrices, the number of columns will be previously described similar algorithms (Akaike
j and for all theoretical derivations, we assume (1975); Arun and Kung (1990); Larimore (1990))
that j ~ oo. How to deal with a finite number of that needed double infinite block Hankel
output measurements will become clear in matrices, consistent estimates are obtained with
Section 8. semi-infinite output block Hankel matrices.
An important remark concerns the estimation For sake of elegance of the proofs, covariance
of the covariance matrices. H e r e t o we assume matrices will still be formed in the theoretical
that they can be estimated as motivation that follows. The "square root"
algorithm described in Section 8 will avoid this
A/ = l i m [ ~ j-I
~ Yk+iY~k] , and will thus have far better numerical
j--.oo k =0 robustness.
for which we will use the notation First, we define the orthogonal projection of
semi-infinite matrices as follows.
=0yk+iy , with an obvious definition of
Definition 2. Orthogonal projections of semi-
E j [ . l = l i m l [ . l . So, due to ergodicity and the
j--,~ j
infinite matrices. Given two matrices P1 e ~p,×i
and P2 ~ ~p2×/, with j---* oo. The orthogonal
infinite number of data at our disposition, we
projection of the row space of P1 onto the row
can replace the expectation operator E (average
space of P2 is defined as: Ej[P1P~]
over an infinite number of experiments) with the (Ej[P2pt2I)-IP2 ~ ~P'×/.
different operator Ej applied to the sum of
variables (average over one, infinitely long,
Now, we compute the orthogonal projection
experiment).
of the row space of Y,12~-1 (the future) onto the
The classical realization approach now follows
row space of Yoli_ l (the past). It follows from the
directly from equations (5)-(6). Consider the
rank deficiency of the covariance matrix
correlation matrix between the future block
Ej[Yil2i_lytoli_l] (11) that the row space of this
output Hankel and the past block output Hankel
projection will be an n-dimensional subspace of
matrix:
the j-dimensional ambient space. This can also
be seen from
t t --1

)
Ai Ai-1 Ai-2 "•• A2 A1 \ Ej[ Y~I2a-,Yor,-,I(Ej[Yol,-t Yol,-,]) Yol,-,
Ai+1 Ai Ai-l "•• A3 A2 = ~ ( A ' - ' G . . . A G G)L~'Yo[,_,
= [ A/+2 Ai+I Ai "'" A4 A3
= ~i~iLTIYoli_,.
~A2,--1
. . . . A~-2
. . . . A2i-3
. . . . ". ' ". . Ai+I
. / [ , , i" where we define
def t
/ (" \ L, = Ej[YoI,_,YoI,_,]
At) A-1 A-z • • " Ai-i
A~ Ao A_ 1 •.. A2_ i
= A2 A1 m~) • " " A3_ i
-- ~~i (say). (11)
Hence we obtain, as j-->o0, a rank deficient Ai- l Ai-2 mi_ 3 •• • a~)
block Toeplitz matrix. Its rank is equal to the
= Ej[Y~Iz,_,Y~I2,_,].
state dimension n and a model for A, G and C
can be obtained from the factorization into an The last equality is due to stationarity. Hence, a
extended observability and controllability matrix basis for the n-dimensional projection of the row
using for instance the approaches via the SVD as space of Y~I~-I onto that of Yoli-i is formed by
in Zeiger and McEwen (1974) or Kung (1978). the rows of the matrix Z;:
Note that the rank deficiency is guaranteed to
hold as long as i is larger than the largest
Z, = ~,Li-'Yol,-,. (12)
controllability or observability index of the In a similar way, we find for the orthogonal
system. projection of the row space of Yoli-~ onto the
row space of Y,%~-t:
4. ORTHOOONAL PROJECTIONS t t -1
In this section, we describe a new approach Ej[Yoli-l Yil2i-1](Ej[Yilzi-m Yil2i-l]) Yil2i-I
t t --1
that avoids the formation of the covariance = c~i~iL i Yil2i-1.
Subspace algorithms for stochastic identification 653

that of Yolk- A basis for the row space of this


0 0
orthogonal projection is generated by the rows
I i-2 of
i-1 i-1 Z,+, = %+IL;2,Yol,. (14)
Z~+~
i i t --1
• c~i_l(~i+lLi+lYi_ll2i_ 1which is the orthogonal
i+1 projection of the row space of Yol,-2 onto that
of Y~-~I2~-,- A basis for the row space of this
2i - 1 2i - 1 projection is generated by the rows of
FIG. 2. Definition of the vector sequences Z~, W~, Z~÷~ and t -1
W~+p The thick lines represent the span of the data, the thin W/+l = ~ i + l L i + l Y i _ l l 2 i _ 1. (15)
lines represent the projections: "from-onto".
Figure 2 illustrates these projections graphically.

Hence, a basis for the n-dimensional projection


of the row space of Y01i_ 1 o n t o that of Y/12~-I is 5. RELATION WITH THE KALMAN FILTER
formed by the rows of the matrix W~: In this section, we show that the sequences Zi
and Z~+I can be considered as the outputs of a
W/= ~7~LT~Y/12~_l. (13) bank of non-steady state Kalman filters after i,
The conclusion is that both the orthogonal respective i + 1 time steps. We also show that
projections of the row space of Y/12i_1 onto that the covariance matrices of these sequences are
of Y01~-i and the other way around, are finite two iterates of the forward Riccati difference
dimensional subspaces of dimension n, with a lot equation.
of structure. Of course, in an infinite dimen-
sional setting (i.e. i--->o0 and j--->o~) this result Theorem 1. Given the sequences Z i =
has been known since the pioneering work of C~iL/-1yo[i_ 1 and Zi+l = c~i+l Li-1
+l Yoli, then:
Akaike (1975). In this framework and also the • Z~ = Ej(ZiZ~) and Y~i+,= Ej(Z~+IZ~+0 satisfy
one presented in Pal (1982), those finite the forward Riccati difference equation:
dimensional orthogonal projections can be Y-,+l = AE~A t + (AY.iC t - G ) ( A o - CEiCt) -1
considered as states of the stochastic process. ( CF, iA t - G').
Hence, it is tempting to consider the sequence Zi
• Z~+, = A Z ~ + Ki(Y,.li- CZi) with K~ = (G -
(12) as the "forward" state and the sequence Wi AY.iC')(Ao-CF,~Ct) -1 equal to the Kalman
(13) as the "backward" state sequences. gain.
However, we will see in Section 5 that for finite
i, these sequences are just an optimal Kalman
prediction of the states based upon the i previous Proof. From the definition of the sequences
outputs. Zi = %Li-lyoli-x and Zi+l = cCi+lL;-+llYoli we find
To arrive at a consistent identification scheme, that E; = EI(ZiZ~) = C~LFlC¢~ and Y~+,=
we also need to define: Ej(z,.,z,.,) t -1 , . If we rewrite Y'i+I
= c~i+lLi+lC~i+l
--1 with the help of the matrix inversion lemma, we
• ~_~c~i+~L~+~Yol~ which is the orthogonal
projection of the row space of Y~÷ll2~-~ onto get:

~'~'i+ 1 = c~i+lLi+lC~i+
--1 t l

=(AC~i G)(cL~i ~o / \ G' ]

= (A% G ) L ; 1 + LF'cC~C'A-'CCCiLi-I -, , , -1 , ,
\ _A-lCC~it-[ 1 A-1 ]l x G t ]

with A = Ao - CC~iLTlC~C '


G ) [ L ; l C ~ A t + Li-lcC~C'A-'C~CiLT'~C~A' - L;-,cC~C,A-,G,~
(ACg,
IX -A-1C%L?lC~A ' + A-IG ' ]
= A C~iL;1C¢~At + A C ~ L ; I ~ a l C ' A - I C % L 7 1 tarA' - A % L ; 1~¢ICtA-'G '
- GA-ICC~iLFlC~ A , + G A - I G '
= A~,i At + A~iCtA-1C~i At - AY.iCtA-1G t _ GA-Ic~,iA t + GA-IG t

= A E , A t + (AF,,Ct - G ) A - I ( C F , tA ' - G'),


654 P. VAN OVERSCHEE and B. DE M o o a

and with A = A o - CZ~C', we find: Riccati equation. The Kalman gain associated
with this equation at time step i is: K~=
Y'i+A = AZ~A' + ( A Z ~ C - G)(A~)- CY.iCt) -1
( G - A ~ , i f t ) ( m o - C~,iCt) -1. We prove now that
x (CZ,A t - G'), Zi÷l is formed out of Z~ with the help of this
Kalman gain. First, we can write (with N,. and M~
which is one step of the forward difference unknown)

Z~+, = N~Z~ + MiEI~


--1 (~ -1
q~i+lLi+lYoli = Ni iLi Yoli-1 + MiYili,
--1 t --1
c~i+lLi+,El[YoliYoli-,]L~ ~ 'ti - N,q~iL,- 1 (~it + M, Ej[Y, IiYoI,-~]L,
_ _ t -1 t
~i,
A
%Li- 1 %t= N,.%Li--1 ~it + mi(Ai Ai-I "'" AI)Li- 1 (~i't (16)
A %Li-I %,-_ Ni~iLi--1 (~it + MiCC~iL~-lc~,
AZ~ = N~Y~ + M i C X ,
A = N, + M,C.

So, N,. = A - MiC, and we can rewrite (16) as:

Zi+l = AZ, + Mi(Y~l,- czi),


(~i+ - 1 IEI[ YoI,Yd,]
t --1 t t
, L,+ -- A q~iLi Ei[Yol~-,Yili] + Mi(Ej[YijiYili] - C % L i -t E j [ Y o l ~ _ l Y ot] )
_ _

~ +M~ &,-C%L; ~ AI_~


°..

\A~ / /
~/+1(0 0 "'" I) t = a•i C t "+ M i ( m 0 - - C ~ _ , i C t ) ,

G = A~.i Ct + MA & , - CXiC').

So Mi = (G - A Y , C t ) ( A o - CZ, C ) - ' = K, which Zo [ 0 ... o ... 0 ]


is exactly the Kalman gain. []

So, if we start from Zo = 0 and Zo = 0, we find YO Yj-1


Zl = K1Yolo = GAo-lYolo = qglLi-lY01o. Through Y01i-1
induction, this clearly proves that the sequence yi-t Yl+i-2 Yi+j-2
Zo, Z1 . . . . . Zi, Zi+l is a sequence of sub-
sequent optimal Kalman state estimates. This
has to be interpreted as follows: the lth column Z, [ z~ ... z,+~-i ... z~ ]
of Z~ is the optimal prediction of the state x~+i_j
based on the measurements Yt-1, Yt, .. •, Yt+i-2 FIG. 3. Interpretation of the sequence Z i as Kalman filter
state estimates based upon i measurements.
and with initial conditions for the predicted
state: $t-i = 0 , E(-~t-l$~-0 = 0. Figure 3 illustr-
ates this graphically. So, in this way every on the average, but the true initial state value
column of Zi represents an optimal state produces "noise" in the forward and backward
estimate. And we can interpret the columns of Z~ estimates of the states (see also Section 9 for an
as the outputs of a bank of j Kalman filters that illustration of this effect).
estimate j optimal states in parallel.
It should be noted that a larger past (larger i) 6, HOW TO FIND A, C, G AND At,
will lead to a better suppression of the influence In this section, we show how the system
of the error on the initial state Zo, and thus to a matrices A, C, G and Ao can be identified
smaller variance on the final estimates of the consistently. All this provided that the matrices
system matrices. This is because the initial state Z i (12), W~ (13), Zi+ 1 (14) and W/+I (15) are
at i lags in the past is set to zero, which is correct available.
Subspace algorithms for stochastic identification 655

Theorem 2. Given the sequences Zi, Z~+I, W~ 7.1. Principal angles and directions
and W~+1. The matrices A, C, G and Ao satisfy:
DefinMon 3. Principal angles and directions.
A = Ej[Z~+iZ~](Ej[ZiZ~) -1, (17) Given two matrices U ~ ~P×/ and V • 9tq×~,
A ' = Ej[W~+IW~](Ei[W~W~])-1, (18) (j---~ oo) of full row rank and the Singular Value
Decomposition (SVD):t
C = Ej[Y~I,Z~](Ei[Z,Z~])-', (19)
Ut(Ej[ UU'])-IEj[ UV'](Ej[VVt])-IV = ptSQ.
G'= Ej[Y,_ll,_I W~](Ej[W~W~])-1, (20)
(22)
A~, = Ej[Yqy~Ii]. (21)
Then the principal directions between the row
Proof of equation (17). spaces of U and V are defined as the left (P) and
right (Q) singular vectors (the principal direc-
tions in the row space of U, respectively V). The
= ~i+1 L i --1
+ l E j [ r o l i V o tl i - l l L i --1 ~ i t cosines of the principal angles between the row
--1 t --1 t --1 spaces of U and V are defined as the singular
X (~,L, Ej[roli_lroli_~]Li ~ )
values (the diagonal of S). The principal
I 0 ... 0 directions and angles between the row spaces of
0 ! ... 0 U and V will be denoted as:
= ~i+1 . . . . . . . . . . . ~[U, V] = [P, S, Q].
0 0 ... I
Geometrical motivation. The principal angles
0 0 --- 0 between two subspaces are a generalization of an
X LZI~(~/LZI~) -1
angle between two vectors (the concept goes
= AC~,L71C~(C~,L?lC~)-1 back to Jordan (1875)). Suppose we are given
two matrices U ~ ~P×J and V ~ ,¢tlq×/. The first
=A.
principal angle 01 (the smallest one) is obtained
The proof of equation (18) is similar. as follows: Choose unit vectors ul e Rrow(U) and
vl~ R~ow(V) and minimize the angle between
Proof of equation (19). themAt Next choose a unit vector u2 e Rrow(U)
orthogonal to ul and v2 e Rrow(V) orthogonal to
Ej[ Y,I,Z,q(Ej[Z,Z I) -1 vl and minimize the angle 02 between them.
= Ej[Y,I~Y'oI~_I]LFlC~(~LFI~) -1 This is the second principal angle. Continue in
this way until min(p, q) angles have been
=(A, A~_l " " A1)LF'C~0g/LFI~) -1
found. This informal description can also be
= CC~,L/-'~(C~,L~-lc~it)-I
formalized.
=C.
Definition 4. Principal angles and directions.
The proof of (20) is similar. Equation (21) The principal angles 01 -< 02-<" • • -< :r/2 be-
directly follows from the definition of Ao and of tween the row spaces grow(U) and grow(V) of
Ej. Hence we have a general outline for finding two matrices U e ~ p×/ and V E ~ qx] and the
A, G, C and Ao from output data, if Z~, Z~+I, W~ corresponding principal directions u,-eRrow(U)
and W/+l were available. and v~ e Rrow(V) are defined recursively as
Let us as a last remark point out that the
matrix A also satisfies the following Sylvester c o s Ok = max u t v = utkvk,
equation. u~Rrow(U), veRr,,w(V)

AEj[ZiZ~] + Ej[W~W~]A subject to Ilull = Ilvll = 1 and for k > 1, utui=O


= + Ej[W W +ll. for i = l . . . . . k - 1 and vtv~=O for i =
1 ..... k-1.
We will show in Section 8 how the principal
7. T H E STATES AS P R I N C I P A L D I R E C T I O N S
angles and directions can be calculated
In this section we show that the matrices Z~
and W~ are generated from the principal etficiently.
directions between the past and future output
data (up to a diagonal scaling). The matrices
Z~+I and W,+1 are also shown t o be equal to t The Singular Value Decomposition is written here as
A = ptSQ instead of A = PSQ t for convenience of notation.
principal directions, up to within a similarity , T h e notation Rrow(. ) refers to the row space of the
transformation. matrix between brackets.
656 P. VAN OVERSCHEE and B. DE MOOR

7.2. Determination ofZi, Wi, Zi+l and Wi+ 1 is easy to derive that:
~i ~ t t -- .
Ej[Y,12,_,Z,l(Ej[Z,Z,]) (31)
Theorem 3.
Given (with j-~ oo): Together with Ej[P](P))']=I, equation (27)
follows easily. Following the same reasoning,
~[Yoli-2, Y/-112/-1] = [P/+I, Si+l, Qi+l],
equations (24) and (28) can be derived.
Q,],
~l~[Y01i-l, E [ 2 / - 1 1 = [P/, Si, Equation (25) can be derived as follows. We
~[Yol,, Y~+q2,--ll = [P~-,, Si-l, Qi-l]. know from (14) that Zi+l is equal to any basis of
the n-dimensional projection of the row space of
Then: Y/+II2i_ 1 o n t o the row space of Iioli. Once again,
• The number of elements in Si different from hAt (~,1 ~ l / 2-i-~
we know from Definition 3 that ,,,i-~wi-w pl
zero is equal to the state dimension n. The is a valid basis, with Mi_~ an arbitrary similarity
same holds for S~-1 and Si+l. This means that transform. In the same way as above, we derive
there are only n principal angles different from that the 6._~ corresponding to the choice of this
~t/2. basis is equal to:
Denote with p1+1, P~, P]-I, a~+l, a~, a~-i
the matrices with the first n principal ~i--I = E j [ Y i + l l 2 i - , ( P i -1, ) t
directions, and with Si+l,
1 Sl:, S]_ 1 the n x n ( q 1 ~ l/2AAtt ]
X i Oi_l) a,Zi_l]
matrices with elements from Si+~, S~, S H (not 1 t --1
zero) on the diagonals.
× (Mi_lSi_lMi_l)
__ 1 t
• Then, a possible choice for the state matrices - Ej[Y/+,I2i_,(P/_,)
is given by:§ x (S]_,)-'nlM,71,
Zi = (S~)u2P~, (23) = ~'i_lMF_ll.
W~= (SI)laQ 1, (24) If we strip the last l rows from ~ in (27), we
must find the same matrix ~_~. Thus:
Zi+ 1 = [ ( ~ i ) t ~ i _ l ]
X ($I_1)112p~_1, (25) ~ i _ I M Z ~ , = e i.

W~+, = [(~,')*(q~i_,)/] This leads t o Mi_l=(~i)*~i_l . Which proves


equation (25). Equation (26) can be proven in a
X (S~+I)I/2Q~+I. (26) similar way.
With
~id-4fE|[Yil2i_l(P~)t(S~)-l/2], (27)
8. A NUMERICALLY EFFICIENT ALGORITHM
We have derived in Section 6 that the matrices
td.~
ef 1 t I --1/2
cgi- Ej[Yol,-I(Qi) (Si) 1, (28) A and C could be estimated from the matrices Z;
~i--I clef Ej[y/+ll2i_,(pl_,) t and Zi+~ while A t and G t could be estimated
from W,. and W~+I. In this section, we show how
x (S~_,)-1/21, (29) these matrices can be computed starting from
the output data and without explicit calculation
c~_ 1~ ' Ej[ r01i_2(Q ~+1)'
of the output covariance matrices.
x (S~+,)-ln]. (30)
Up to now, we have been working with an
infinite amount of data (j--~ oo). However, for all
Proof. Since the rank of Ej[Y/[2/_l~0li_l] is equal practical applications the number of data is
to n (see (11)), we know that the left hand side
finite. This is why from now on, we will replace
of (22) will be of rank n. This implies that there 1
are n singular values different from zero, which the operator E j = l Ji m
~ I~ [ . ] by 7 ["]" The or-
proves the fact that there are only n principal
angles different from at/2. From (12) we know thogonal projection as defined in Definition 2
that Zi can be chosen to be any basis for the thus becomes a normal orthogonal projection of
finite size matrices:
projection of the row space of Y/12i_ 1 o n t o the
row space of Yoli-1. From Definition 3, we know
that (S~)lnP~ is such as basis. This proves I[UW] [VV/] V UW(VW)-'V.
l
equation (23).
In a similar way as we proved equation (19), it 8.1. Principal directions and the quotient singular
value decomposition
In this subsection, we derive an elegant
§At denotes the Moore-Penrose pseudo-inverse. The
matrices ~, ,4 denote the matrix A with, respectively the last algorithm to compute principal angles and
I and the first ! rows omitted. directions, which is based on a combination of
Subspace algorithms for stochastic identification 657

the QR-decomposition and the QSVD. The row spaces of A and B are given by:
QSVD is a generalization for two matrices of the ~ [ A , B] = [ ( Q I U ) t, S, ( Q 1 U S + Q2VT)t],
SVD for one matrix.
Proof. Because A and B are of full row rank, we
Lemma 1. Any pair of matrices A ~ .~tm×n and
have
B ~ .~tp×n can be decomposed as
At(AAt)-IA = 01 Oil,
A = UaSaX',
B t ( B B ' ) -1B = (Q1 US + Q 2 V T )
B = UbSbX t,
x (S 'U' Q~' + T W ' Q2).
'
where Ua e ,film×m, Ub ~ .~tp×p are orthonormal,
It follows from the properties of the QSVD that
X ~ ~ × ~ is square non-singular and S~ e ~m×-
StS+TtT=I and hence Q I U S + Q 2 V T is an
and Sb e ,~tp×" have the following quasi-diagonal
orthonormal matrix. We find that
structure:
At(AA')-~ABt(BBt)-IB
Sa-~. U)S(Q, US + Q 2 V T ) ' ,

r,,, - rb
{r..lr 0
r.b-r n-r.b)
0 0
= (Q,
which is clearly an SVD. It follows from
Definition 3 that it contains the principal angles
ro + rb - r~b ~ 0 Da 0 0 '
and directions.
m-r~ - 0 0 0 0
The major advantage of this result lies in the
fact that principal angles and directions can
Sb ~

basically be computed from blocks of the


fr~b-r~ ro+r,-ro, r,~-ro n-rab) R-factor of a QR-decomposition. This R-factor
p - rb 0 0 0 0 consists of matrices of which the dimensions are
r,, + rb - r,,b~ 0 Db 0 0 ' small compared to ].
rob ra - - \ 0 0 I 0 It should be pointed out that a different way
for computing principal angles and directions,
where without forming covariance matrices, was de-
veloped by Bj6rck and Golub (1973).
ro,=rank(A)
8.2. Algorithm
and Consider the QR-decomposition of the output
block Hankel matrix Yol2~-1 with the following
D~ + D ~ = I,o+,~_r,~. appropriate partitioning (which is also visualized
in Fig. 2):
The original formulation of the QSVD (which . l ( i - 1) l 1 l ( i - 1)
used to be called the generalized SVD, but see
also De Moor and Golub (1989)) is due to Paige
and Saunders (1981) and Van Loan (1976). We Yo12,--1= l R21 R22 0
are now ready to derive the following theorem V~- l R31 R32 R33
(which to our knowledge is new). l(i - 1) R41 R42 R43 R44
Theorem 4. Consider the QR-decomposition of
the concatenated matrix ×/¢q
~Q'3]" (32)
(A)::. 0:0, \Qt/
\R21 R22)\Qt2,1'
It should be noted that Verhaegen (1991) uses a
where both A and B are of full row rank and similar QR-decomposition in his identification
consider a QSVD of the matrix pair algorithms. Now, compute three QSVDs:
(1) (R41 R42 R , a ) ' = U ~ - , S , - I X ~ - , ,
R'21 = U S X ' , R~4 = E - I Ti-lXi_l,
R'22 = V T X t,
(R3, R32~'= U~S,X~,
where U, V are orthonormal, S and T (2) \R41 R42]
quasi-diagonal and X square nonsingular. Then, (R. 0 )' = v,T,XI,
the principal angles and directions between the \R43 R44

AUTO '~J: 3-F


658 P. VAN OVERSCHEE and B. DE MOOR

R21)' The expressions for C and G' are a logical


(3) JR3, = Ui+ISi+IX~+1, consequence of (19)-(20) and the way C'~ (q~) is
\R41 calculated (31). Equation (18) is not repeated
here since both ways ((17) and (18)) of
calculating A are equivalent (for j large enough).
R32 R33 = V/+) T i + I X ~ + 1. Note, that in these final expressions, the
R42 R43 R44] orthogonal matrices QI, Q2, Q3 and 04 have
canceled out, so they do not have to be
With n, the state space dimension from the calculated. Only the R factor has to be
number of principal angles different from st/2, calculated. Another observation concerns the
let X~_~, X~, Xt+~, Ui_~, Ui, Ui+~, E-~, Vii and "covariance" matrices Z~Z~ and W,W~. From
V~+~ with superscript 1 denote the first n columns (33)-(34) we find ZgZ~=S]=VC~W'e. Hence,
of the respective matrices. Also, let S~_I, S~, these products are diagonal and equal to each
Si+l, T~_~, T~ and Ts+~ with superscript 1 denote other. So, the algorithm returns stochastic
the n x n upper left sub-matrices of the models that are balanced (for finite i). If i ~ ~,
respective matrices. Now, equations (27)-(30) we have stochastically balanced models as
can be evaluated, and after some straightforward defined in Pal (1982).
calculations, we find:~;
9. AN E X A M P L E
c¢~L Rt:2.,:2U~(S)),/2, 9.1. Taking the same linear relations
Estimates of the state sequence Zi are
xL,(sL,) ,/2, obtained by taking linear combinations (E) of
-t J .O /'II [~,1 ~1/2 the past output matrix Yoli_l a s Zi=FiYoli_l .
(~i--I ~ l~'l:l.l:lt-Ji+l~,tai+l] •
Now, in earlier work (Akaike (1975); Arun and
The similarity transformation of equation (25) Kung (1990); Larimore (1990)), the shifted state
can be calculated as follows: sequence Zi+l is computed by applying the same
[(~)* ~,_,1 ~ (S~)-I/2(X,')+XL ,(SL,)1/2. linear combinations (F~) to the shifted Hankel
matrix Yll i a s Z,+l = F,Y~Ig. While this trick works
The state sequences can be found from for purely deterministic systems (see e.g.
equations (23)-(26) as: Moonen et al, (1989)), it does not work for
Z, ~ (S])'/2( U~i)'Q]:2, (33) stochastic systems for finite i, as our formulas in
the previous section clearly indicate. In general,
Wi J= (S))~/2[( U~S~)tO~:2
one can state that for small i, the bias for the
"~- t~V I Ti l ~i)t t -~3:41,
) , 1 (34) classical approach (same linear combinations)
becomes worse as some of the poles of A move
z,+, towards the unit circle. Only for i---~oc, this
× S]_,(U)-,)'0',:3, classical approach will be exact. The following
w,+, '-2 easy example illustrates this clearly. Applying
formulae (12) and (14) to a block Hankel matrix
x [(U]+,S)+,)'Q]:, with, respectively, one and two block rows
1 I t t
+ (Vi+~ Ti+~) 0 2 : 4 ] . (SISO system), gives:
Finally, from equations (17)-(21), we find the Zl = <~lL~-lYolo = GAo IYolo,
system matrices as:
Z2=(AG G) Ai
< )'
A1 - Yolv
A2
A L (S))-m(X))*X)_~ S)_~ (U~_~)' U] (S))- 1/2,
C ~ the first I rows of ~, We then find that
[X l( s ) ) l/2]first, .....
Z2Z~(Z,Z~)_I=(A G G)(A+ A,) '
G' L the last I rows of qg~ \A~ A2
I 1 1/2
/[R,:2,t:zU~(S,) ],..t, ..... x Ej[ Y0blYf)I<)]A~7'G'
A o J R3:3,1:3R~:3,1:3. x (GAg1G') -1
=m.
~:Rk: t.... denotes the sub-matrix taken from block row k to If on the other hand we take, as is usually done,
block row l, and from block column r to block column s. T h e
blocks are determined from the partitioning defined in (32).
the same linear combinations for Z2 as for Z~
Q'k:t denotes the submatrix of Q' (in the same R Q (namely GAIT1), we find:
decomposition), from block row k to block row I. T h e
symbol 1_ m e a n s that the equality holds when j---* o~. Z2 = GA(T1Y~II,
Subspace algorithms for stochastic identification 659

and for the algorithm of Aoki tends to be larger than


the variance for the algorithm described in this
Z2Z'~(Z,Z~)-' = A , / A o = C G / ( C E C + R), paper.
which has nothing to do with A. For a numerical
example, consider the very simple SISO system 9.2. A n example with undamped poles
with only one state (Q = B B t, S = B D t, R = For finite i, the difference between the
DDt): classical approach and the approach described in
A = 0.9490, B = 0.6574, this paper becomes more pronounced as A
becomes more and more marginally stable.
C = 0.1933, D = 0.8981.
Clearly a limiting case occurs when A has poles
Figure 4 (top) shows the eigenvalues of the
identified A matrix (a scalar), in function of the
parameter i (j = 1000) for three algorithms:
(a) taking the same linear combinations as A =
0.6
-0.8
0.80)
exactly on the unit circle. Consider the system:

0.6 0 , B= ,
described in this section; 0 0 0.8
(b) the algorithm as described in Section 8;
(c) the algorithm described in Aoki (1987).
C= 0.3 , D = (1),
For every i, 100 Monte Carlo experiments were
performed. The mean value of these 100 \0.5/
experiments is plotted for every i on Fig. 4 (the which is a third order system with one output,
exact value is 0.949). Clearly the algorithm and complex poles on the unit circle (this is a
described in this section (which is the one that is limiting case of a stable stochastic system). So,
normally used in the literature), calculates there is a pure autonomous sinusoidal mode in
heavily biased solutions for small i. The bias is the output (initial state x 0 = ( 1 1 0)'). The
still significant for i, as large as 8 (eight times eigenvalues of A are: 0.6 + 0.8i and 0.8. Once
larger than the dimension of the state vector). again, we performed 100 Monte Carlo simula-
On the other hand, the new algorithm (of tions (i = 4, j = 1000) for the two algorithms.
Section 8) does not suffer at all from this Figure 5 shows the absolute values of the 100
problem. Neither does the method of Aoki, estimates of the poles. The top figure used the
which is a covariance based method, and is algorithm of the previous subsection. The full
mentioned just for comparison. lines are the absolute values of the exact poles (1
Figure 4 (bottom) also shows the variance of and 0.8). Clearly there is a bias on the estimates
the estimated A matrix. Clearly the variance of about 0.2 = 20%. The second figure shows the
grows smaller when i grows larger (as mentioned estimates with the new algorithm of this paper.
in Section 5). It should also be noted that, for The bias is eliminated. The sample mean for
small i, the biased algorithm of this section these 100 experiments was:
trades off bias for a smaller variance. Even
though it is not clear from this example, we (0.7993~ = (0.9983~
found from other experiments that the variance m~ = \0.6733]' mb \0.8080/

The real values are: 1 and 0.8.


1

10. CONCLUSIONS
0.9[
In this paper, we have derived a new subspace
0.8 algorithm for the consistent identification of
stochastic state space descriptions, using only
0.7
0 2 4 6 8 10 12 14 16 semi-infinite block Hankel matrices of output
i measurements. The explicit formation of the
covariance matrix and double infinite block
Hankel matrices are avoided. The algorithm is
0"08I
0.06
based on principal angles and directions. We
0,04 have described a numerical robust and fast way
0.02 ~ 8 8 6 Q 0 ~ t ~ to calculate these on the basis of Q R and QSVD
0 decomposition. We have also interpreted the
0 2 4 6 8 10 12 14 16 principal directions as non-steady state Kalman
i
filter states. With an example, we have
Fro. 4. Mean (top) and variance (bottom) of the estimated illustrated that the new algorithm is superior to
eigenvalues in function of i for the algorithm of Section 9.1
(circle), for the algorithm of this paper (star) and for the the classical canonical correlation algorithms,
algorithm of Aoki (cross). especially for small i.
660 P. VAN OVERSCHEE and B. DE MOOR

1.2

0.8
,* ,~, ,~ - ., ,1..-,.,,.,,,,:,,., ,,, , , ,. [ , . , "-,;,,%, ; , " , , ; , ,,/
0.6 - ij , ', ': i: ;"
I I I I I I I ! I
0 10 20 30 40 50 60 70 80 90 100

1.2

vv,
'l ~
""'v.y.~. -v'"v--v'F'I)(~
Ii f
, t,t
U ~- ,v: , ~ , ,, i
w, '~Y't~,..,~
i 'i i
,~,, ? ~l'',,', ;, ~ ,,i ,. . ,;~ ~, :,, f, , ;,,~ 1;~: ,
0.8 4.--q-.- ~". . . . . "4V- ~.'-4'
' ';'~,l". ,' ", ,,:
-~a.-.%u,'~-.~.-.~
;" ',,l '
.......I I;-'~,.4-s.-,I.~..,..t...,
,' ;;,~,.,t
..... ~, - -%'--.-I
,,~," ,l '..,; 1 : "
..... ~,:......

0.6 ~ i~ i ' ' ' - il~


!

0 '
10 2'o .
30 . 40
. . 50 60 70 8'o 100
FIG. 5. One hundred estimates of the absolute value of the eigenvalues with the algorithm of Section 9.1 (top). The horizontal
lines show the exact results. The bottom figure shows the same for the algorithm of this paper.

REFERENCES Larimore, W. E. (1990). Canonical variate analysis in


Akaike, H. (1975). Markovian representation of stochastic identification, filtering, and adaptive control. Proc. of the
processes by canonical variables. SIAM J. Control, 13, IEEE 29th Conf. on Decision and Control, HI, pp.
162-173. 596-604.
Aoki, M. (1987). State Space Modeling of Time Series. Ljung, L. (1987). System Identification: Theory for the User.
Prentice-Hall information and system sciences series,
Springer Verlag, Berlin.
Arun, K. S. and S. Y. Kung (1990). Balanced approximation Englewood Cliffs, NJ.
of stochastic systems. SlAM J. Matrix Analysis and Moonen, M., B. De Moor, L. Vandenberghe and J.
Applications, 11, 42-68. Vandewalle (1989). On- and off-line identification of linear
BjOrck, A. and G. H. Golub (1973). Numerical methods for state space models. Int. J. of Control, 49, 219-232.
computing angles between subspaces. Numer. Math., 27, Paige, C. C. and M. A. Saunders (1981). Towards a
579-594. generalized singular value decomposition. SIAM J.
De Moor, B. and G. H. Golub (1989). Generalized singular Numer. Anal., 18, 398-405.
value decompositions: a proposal for a standardized Pal, D. (1982). Balanced stochastic realization and model
nomenclature. Numerical Analysis Project Report 89-04, reduction. Master's thesis, Washington State University,
Department of Computer Science, Stanford University. Electrical Engineering.
Faurre, P. (1976). Stochastic realization algorithms. In R. Papoulis, A. (1984). Probability, Random Variables, and
Mehra and D. Lainiotis (Eds), System Identification: Stochastic Processes. McGraw-Hill, NY.
Advances and Case Studies. Academic Press, NY. Van Loan, C. F. (1976). Generalizing the singular value
Hotelling, H. (1936). Relations between two sets of variates. decomposition. SlAM J. Numer. Anal., 13, 76-83.
Verhaegen, M. (1991). A novel non-iterative MIMO state
Biometrika, 28, 321-377.
Jordan, C. (1875). Essai sur la g6ometrie ~ n dimensions. space model identification technique. Preprints 9th
Bull. Soc. Math. France, 3, 103-174. IFAC/IFORS Symposium on Identification and System
Kung, S. Y. (1978). A new identification and model Parameter Estimation, Budapest, Hungary, pp. 1453-1458.
reduction algorithm via singular value decomposition. Zeiger, H. P. and A. J. McEwen (1974). Approximate linear
Proc. of the 12th Asilomar Conf. on Circuits, Systems and realization of given dimension via Ho's algorithm. IEEE
Computers, Pacific Grove, pp. 705-714. Trans. Aut. Control, AC-19, 153.

You might also like