Shaping Frequency Dependent Time Resolution When

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Shaping Frequency Dependent Time Resolution when Estimating Spectral Properties with Parametric Methods

Fredrik Gustafsson, Svante Gunnarsson, Lennart Ljung Department of Electrical Engineering, Linkoping University S-581 83 Linkoping, Sweden Email: fredrik@isy.liu.se,svante@isy.liu.se,ljung@isy.liu.se EDICS: 3.9.3
Abstract | The problem to track time-varying properties of a signal is studied. The somewhat contradictory notion of \timevarying spectrum" and how to estimate the \current" spectrum in an on-line fashion is discussed. The traditional concepts and relations between time and frequency resolution are crucial for this problem. We introduce two de nitions for the time resolution of lters, essentially measuring the e ective number of past data that are used to form the estimate. In, for example, wavelet transform techniques frequency dependent time resolutions are used so that fewer data are used at higher frequencies, thus enabling faster tracking of high frequency components (at the price of worse frequency resolution). The main contribution of the paper is to show how this same feature can be introduced when estimating spectra via a time-varying, autoregressive model of the signal. This is achieved by a special choice of nominal covariance matrix for the underlying parameter changes.

It is a basic problem in many applications to study and track the time-varying properties of various signals. This is at the heart of adaptation and detection mechanisms, and there is a rich literature on this subject, e.g. 15] and 12]. In many contexts it is very attractive to describe the signal characteristics in the frequency domain, i.e. its spectral properties. The spectrum is itself an averaged, time-invariant concept and generalization to a \time-varying" spectrum is somewhat tricky. One aspect of this problem lies in the well known frequency-time uncertainty relation, i.e. that the frequency resolution depends on the time span. We will argue that it is natural to demand a quicker response, i.e. better time resolution from the adaptive algorithm, at the high frequency than at the low frequency end. In other words we seek a frequency dependent time resolution of our algorithm. This, as such, is nothing new. A typical use of the wavelet transform ( 1], 14], 2]) is exactly to have di erent trade-o s between time and frequency resolution in di erent frequency bands. From this perspective we shall examine current parametric adaptation algorithms and see if they can o er this desired feature. It will turn out that the most used adaptation algorithms { Least Mean Squares (LMS) and Recursive Least Squares (RLS) { do not give this kind of trade-o : The time resolution for RLS is frequency independent, while for LMS it depends on the level of the spectrum (not the frequency). The major point of this contribution is, however, that a frequency-time trade-o of the desired type can be achieved also in parametric modeling. The key is to use a Kalman- lter based algorithm with a carefully tailored state noise covariance matrix. It is worth stressing that there is no \optimal" solution to the signal tracking problem, unless the character of the time variation is known. Instead, some ad hoc choices are used: LMS for

I. Introduction

the simplicity of the algorithm, or RLS for the rapid recovery after a sudden change. These two approaches can be seen as corresponding to di erent, particular choices of state noise covariance matrices in a Kalman lter (see 11]) and each have just a scalar tuning parameter (step length and forgetting factor, respectively). What we suggest is another ad hoc, default choice of adaptation, also with one scalar tuning parameter and corresponding to a particular state noise covariance matrix. The merit of this approach is that it gives a natural solution to the time-frequency resolution problem. The paper is organized as follows: In Section 2 we discuss the notion of a \time-varying" spectrum and how it can be formalized, and in Section 3 we make some brief comments on methods for non-parametric spectrum modeling. Section 4 then deals with techniques for parametric spectrum modeling, and in Section 5 we show how the time and frequency resolution can be characterized in terms of a frequency dependent window size and an algorithm is proposed. A completely di erent approach based on a random walk model of the spectrum is in Section 6 shown to give the same result. The proposed technique is then illustrated on simulated data in Section 7. Finally some conclusions are given in Section 8. Consider a signal y(t), which we for this discussion take to be observed in discrete time: y(t) t = 0 1 ::: (1) One of the most successful ways to describe the properties of y(t) is to study its spectrum
y (! ) =

II. Time-Varying Spectra

1 X

where

k=;1

Ry (k)e;ik!

(2)

1 Ry (k) = Nlim !1 N

N X t=1

y(t)y(t ; k)

(3)

assuming that the limit exists for all k. There is of course an extensive literature on how to estimate and utilize spectra. See for example 9]. Now, the spectrum is inherently a time-invariant property, or a time-averaged property. If the signal has time-varying properties - whatever that means - they won't show up in (2), other than in a time-averaged fashion. Nevertheless we may want to capture "time-varying properties" in spectral terms, at least intuitively. There are many attempts to describe such time varying spectra, " y (! t)" (4)

from simple spectrograms (using spectral estimates computed from nite and moving blocks of observed data) to sophisticated transforms of various kinds. Lately there has been a substantial interest in the wavelet transforms also as a means to capture some variant of (4). We shall brie y comment on some of these approaches in the next section. We can think of y (! t) as a "snapshot" of the signal's frequency contents at time moment t. It is clear, though, that due to the uncertainty relationship between time and frequency there will be problems to interpret what a "momentary frequency" might be. Let us here introduce a formal de nition of y (! t) that in itself is non-contradictory. We shall assume that the signal y(t) is generated from a stationary signal source e(t) as an AR-process: (5) At(q)y(t) = e(t) or, in longhand, y(t) = ;a1 (t)y(t ; 1) ; ; an (t)y(t ; n) + e(t) (6) where At (q) = 1 + a1 (t)q;1 + + an (t)q;n (7) the momentary spectrum as
y (!

We choose to discuss extensions to the time-varying case in terms of the spectral factor Y (! t) rather than the spectrum itself. The simplest way to get a time-frequency representation is to use the Short-Time Fourier Transform (STFT) X Y STFT (! t) = y( )wN (t ; )e;j! (10) Here and in the sequel wN (t) is a time window used to obtain time-resolution. In this case, we take the Fourier transform of the time-windowed signal. The time window's length N trades o good time resolution (small N ) and frequency resolution (large N ). A related but conceptually di erent method that recently has met considerable interest is the Wavelet Transform (WT). Surveys of the wavelet theory are provided by 1], 14], 2], and its relation to time-frequency representations is given in 7]. The mother wavelet wN (t) is a bandpass function of e ective time width N0 . If we assume that the Fourier transform of wN (t) is essentially concentrated to its center frequency, say !0 , then we have a time-frequency representation,

! ( ; t) d : y( ) ! wN ! (11) 0 0 Here e(t) is white noise with variance r2 and q;1 is the inverse Since the mother wavelet wN is xed, the wavelet theory sugshift operator. For the signal y(t), generated by (6), we de ne gests a narrower time window at higher frequencies. See also Y WT (! t) =
2 t) = j A (r ei! ) j2

r!

(8)

In 10] the authors use the term instantaneous spectrum for this quantity. This is an exact de nition of a momentary spectrum, but the question is whether (8) captures what we intuitively have in mind with the concept "spectrum". We can make two rather obvious observations around this: "A quick change" in the spectrum at low frequencies is rather to be interpreted as a high frequency component in the signal. To be perceived as a variation of the spectrum at a certain frequency the rate of change must be signi cantly slower (a factor 10 or so) than the frequency itself. All this is of course well in agreement with well-known practical ways of handling "time-varying spectra". In amplitudeor frequency-modulation, the modulating signal must change much slower than the carrier. That will also allow the signal to pass with the carrier through the band pass lters designed for the carrier. The bottom line of this discussion is thus: While (6)-(8) make perfect sense as a formal de nition, it is only meaningful as a de nition of "time-varying spectra" if the time variation of At (q) is such that y (! t) changes signi cantly slower than the frequency ! in question. This section brie y describes some non-parametric and parametric approaches to estimate time-varying spectra. We will focus on di erent possibilities to get di erent time resolution at di erent frequencies. A. Non-parametric modeling Among all proposed approaches to spectrum estimation, see for example 8], 9], 3], 7], the simplest one is based on the squared magnitude of a Fourier transform Y (!) of the signal, commonly referred to as the periodogram, ^ y (!) = jY (!)j2 : (9)
III. Motivation and Relation to other Approaches

(15) below. B. Time Resolution We shall in this paper use two di erent, but related de nitions of time resolution of lters for spectral estimation. De nition 1: Let the spectral estimate ^ y of a signal y be obtained by linear ltering of y, followed by a static, non-linear transformation: Zt ^t ( ! ) = f ( h! (t )y( )d ) (12) y

The time resolution is then de ned as R t jh! (t )jd )2 ( ;1 (13) N (!) = R t ;1 jh! (t )j2 d For a discrete time signal we use summation instead of integration. Note that the de nition implies that N (!) 1, with equality for impulse-like time windows. This de nition is well in accordance with usual measures of lter widths, like for tapering windows of various kinds. A rectangular window h! (t ) = c of length N has the time resolution (14) For the wavelet lter (11), using the variable substitution s = !=!0 we nd that
2 c2 N (! ) = N Nc2 = N

;1

R p ! wN ; ! ( ; t) ! ! N (!) = R p ! ; wN ! ( ; t) ;R j!w ((s ;!t))j ds 2 0 s N =! ! R jw ((s ; t))j2 ds


0 0 0 0

d 2 d

(15) That is, the e ective number of data points that are used to form the spectral estimate at each point is inversely proportional to frequency and the scale factor is xed and given by the mother wavelet.

0 =! ! N (!0 ):

C. Parametric modeling model (5) we propose is standard in time series analysis: Asthat the signal y is generated from a white Gaussian noise Parametric models for estimating time-invariant spectrums sume source e with zero mean and variance r2 as have been used for a long time, see for instance 9]. To unify parametric and non-parametric methods, consider rst the foly(t) = ;a1 (t)y(t ; 1) ; ; an (t)y(t ; n) + e(t) lowing piecewise constant Fourier transform: (22) = 'T (t) (t) + e(t) Y (w t) = c1 (t)I1 (!) + c2 (t)I2 (!) + + cn (t)In (!): (16) where

where

'(t) = (;y(t ; 1) : : : ;y(t ; n))T


(t) = (a1 (t) : : : an (t))T

(23)

(24) Other interpolation functions are possible. Note that the number of coe cients n does not need to be the same as the number The corresponding spectral factor is then pr2 of data. The parameters of this model correspond directly to Y (! t) = 1 + a (t)e;i! + a (t)e; the frequency content in the interval !k;1 < ! < !k . One way i2! + + an (t)e;in! 1 2 to estimate these parameters is to use the discrete Fourier trans(25) form of the rst frequency point in each interval. A variant of For notational convenience we write the momentary spectrum this, with an exponential time window, is as

Ik (! ) =

1 !k;1 < ! < !k 0 otherwise

(17) and

c ^m (t) = (1 ;

m)

t X t;k y (k)e;ik!m m k=1

y (!

= mc ^m (t ; 1) + (1 ; m )y(t)e;it!m : (18) A rectangular window wN (t) of size N has the advantage of implying a natural choice of frequencies !k = 2 k=N k = 1 2 : N as the usual FFT bins. Compared to the rectangular window, the advantage of an exponential window is the recursive implementation, given by the second equality in (18), where somewhat less computations are needed, and that no memory for old data is needed. In the lter (18), the number 0 < m < 1 is usually referred to as the forgetting factor. The time resolution of (18) is according to De nition 1 P t;k )2 ( t m (19) N (!) = Ptk=1 2( t;k) 2 (20) 1; m where the rst approximation is for large t (or rather small t) and the second one for close to 1. Now we have an additional degree of freedom of choosing frequency dependent forgetting factors (or time windows). Choose a time resolution N (!) and let 2 (21) k = 1 ; N (! ) k As long as we just want to look at a plot of a time-varying spectrum, (16) might be enough. If we want something more out of our model, we need more sophisticated models. This is what we now turn to.
m k=1 m 1; 2 m (1 ; m )2

t) = j 1 + r(2! t) j2

(26) (27) (28)

where (! t) = W (!) (t) Here denotes complex conjugated transpose and

W (!) = (ei! : : : ein! )T

+ m =1 1;

We have the following advantages with (25) over (16): The AR model (22) can be used as a predictor, omitting the unpredictable noise e(t). AR models are known to be able to give high frequency resolution. The AR model has found many practical applications in adaptive control and adaptive ltering problems in e.g. communications. One drawback with the AR model is that it is not known how to shape the time resolution as a function of frequency. In Section V it will be described how the AR coe cients can be estimated with a frequency dependent forgetting factor (!), similar to the one in (21). B. Recursive Estimation For the estimation of the momentary spectrum via the parameters in the AR-model we shall use an algorithm de ned by the update equation ^(t) = ^(t ; 1) + K (t)"(t) (29)

"(t) = y(t) ; 'T (t) ^(t ; 1)


where

(30)

(31) A. The Model As an alternative to (16), we introduce another parameteri- and zation with the same number of parameters and basically the (t ; 1)'(t)'T (t)P (t ; 1) + R ^1 (t) same degree of freedom, but another interpolation between the P (t) = P (t ; 1) ; P r ^2 + 'T (t)P (t ; 1)'(t) frequency points than a piecewise constant function. The AR (32)

IV. Autoregressive Signal Models and Their Estimation

P (t ; 1)'(t) K (t) = r ^2 + 'T (t)P (t ; 1)'(t)

This algorithm can interpreted as a Kalman lter for an underlying state space model of the variations of the parameters (t). See, for example, 12]. In this interpretation the matrix ^1 (t) is the assumed covariance matrix of the changes in the R true parameter vector, while r ^2 is the assumed variance of the noise e(t). In this paper we will however not make any assumptions concerning behavior of the true parameters, and rather consider the algorithm just as a method for computing param^ 1 (t) will then be seen as a tool for eter estimates. The matrix R adjusting the tracking properties of the algorithm, while r ^2 can be used as a scaling factor. We shall return to the choices of the ^ 1 (t) and r design variables R ^2 in a moment. Using the algorithm above we obtain (33) ^(! t) = W (!) ^(t) and the spectral estimate will be r ^2 ^ y (! t) = (34) j1 + ^(! t)j2 C. Filtering interpretation Although the algorithm can be derived and motivated from a stochastic perspective, it should be noted that for a given sequence f'(t)g, equations (29){(33) are just a linear, timevarying lter into which y(t) is fed, and which produces ^(! t) as its output. To emphasize this observation we use equations (29) and (30) to obtain ^(t) = (I ; K (t)'(t)) ^(t ; 1) + K (t)y(t) (35)

D.1 Deterministic setting We will show that De nitions 1 and 2 coincide when ' is scalar and constant. First, if '(k) is scalar and constant '(k) = c, then as t ! 1
t!1

W (!) = c12 = tlim !1 W (!) t k=1 (40) To see this, note that after a transient also K (t) will be a constant scalar d and '(k)'T (k) h(t k) = W (!)
so

" X 1 t

lim

"X t
k=1

jh(t k)j

#2

#;1

"Y t

j =k+1 t ; = (1 ; dc) k d

(I ; K (j )'T (j ))

K (k)

(41) (42) (43)

t X k=1

; dc) d ! 1 as t ! 1 jh(t k)j = d ; (1dc c

which proves (40), and hence that De nitions 1 and 2 coincide in this special case. D.2 Stochastic setting We can also motivate the alternative de nition by invoking Assuming ^(0) = 0 we can express ^(t) as some statistical arguments. In a stochastic setting, the time # resolution of De nition 2 is the length of a rectangular window t " Y t X gives the same variance error as the given algorithm. T ^(t) = (I ; K (j )' (j )) K (k)y(k) (36) that We consider { with some abuse { (22) as a linear regression k=1 j =k+1 model with a constant and a given sequence f'(t)g. Suppose we estimate using the Least Squares method with t measureThis implies ments with equal weight, k = 1 : : : t (\a rectangular time wint X dow of length t"), and that the sequence fe(t)g is white noise ^(! t) = h(t k)y(k) (37) with unit variance. Then it is well known that the covariance k=1 matrix of this estimate ^t will be t where the time varying and frequency dependent impulse reX T ~ ~ sponse is given by Ett = '(k)'T (k)];1 (44)

h(t k) = W (!)

"Y t

j =k+1

(I ; K (j )'T (j ))

K (k)

k=1

(38)

D. Time Resolution For a given sequence f'(t)g the impulse response de ned by equation (38) is a deterministic quantity, and we could then apply De nition 1 for this general lter to nd out its time resolution. It turns out that the denominator of (13) can readily ^1 ), while the numerabe calculated (asymptotically for small R tor is more di cult. We shall therefore introduce the following alternative de nition of time resolution for the lter (29){(33): De nition 2: For the lter (29){(33) we de ne the time resolution as 1 Pt '(k)'T (k)];1 W (! ) k=1 N (! t) = W (!) t P (39) t jh(t k)j2 k=1

Here ~ denotes the parameter error ~(t) = ; ^(t):

(45)

This means that the variance of the error in (de ned by (27)) will be equal to the numerator of (39), divided by t. In other words, the numerator of (39) equals t times the variance of , when estimated using a rectangular window of length t. Let us now turn to the algorithm (29){(33), again assuming that (t) , and that fe(t)g is white noise with unit variance. We note that the parameter error obeys the following recursion: ~(t) = (I ; K (t)'(t)) ~(t ; 1) + K (t)e(t) (46)

i.e. the same di erence equation as for ^(t) but driven by the white noise e(t) instead of the output. The error in can be 2 written We shall motivate this de nition in two di erent ways and then ~(! t) = W (!) ~(t) (47) the main theorem is given.

giving ~(! t) =

X
t k=1

h(t k)e(k)

(48)

where h(t k) is given by equation (38). Computing the variance of ~(! t), using the Kalman lter algorithm, hence gives us the denominator of (39). This gives us the interpretation that N (! t) in De nition 2 is the length of a rectangular data window for a Least Squares estimate, that gives the same error variance as the algorithm. D.3 An explicit expression for the time resolution The time resolution according to De nition 2 can be computed explicitly, for large values of the model order n, and for ^1 . This \slow adaptation", i.e. small values of the matrix R result is given as a theorem. Theorem 1: Consider the algorithm (23), (29){(34) for esti^ (t) = R ^1 . Its time mating the spectrum of a signal y, with R resolution, according to De nition 2, is

(61) which happens to be the numerator of (39). Now, apply these relations together with (95) rst applied to (54), 1 z (! ) (62) (!) = 2 and then to (55), rr ^1 (!) (63) z (! ) = r ^2 y (!) Combining (62) and (63) and noting that (!) = limn!1 (! t)=n gives that for large but nite n the variance (! t) is approximately given by

From 13] we have

y (! ) = q (! )

r ^1 (!) (! t) n (!) = n 2 r ^2 y (!)

r 1

1 n 2

s1

nW

N (! t)

2 y (! t)

^ 1 W (!) (!)R r ^2 y (!) (64)

r ^2 1 W (! )R ^ 1 W (!) : n

(49)

^1 jj ! 0, in the sense The result is asymptotic in n ! 1 and jjR that the ratio of the two sides of (49) tends to 1 under these limits, N (! t) q lim lim = 1: (50)

4 Var~(! t) (! t) = (51) Using (47) above we get (! t) = W (!) (t)W (!) (52) where (t) = E ~(t) ~T (t)] (53) ^1 , the matrix (t) can From 4] and 6] we have that, for small R be approximated by a constant matrix , which satis es QZ + ZQ = ZQZ (54) where the matrix Z is given by 1^ ZQZ = r (55) ^2 R1 and t X (56) Q = lim 1 '(k)'T (k)
t!1 t

Proof: The proof is based on variance results for recursively estimated models, derived in 4]. De ne

^ 1 jj!0 n!1 p jjR

2 y (! t)

r ^2 1 W (! )R ^ 1 W (!) n

Applying the limit operation described in Appendix A to Q, Z , ^1 , we get R q(!) = lim 1 W (!)QW (!) (57) 1 (!) = nlim !1 n W (!) W (!) 1 z(!) = nlim !1 n W (!)ZW (!) ^ W (!) r ^ (!) = lim 1 W (!)R
1

k=1

n!1 n

(58) (59) (60)

n!1 n

Inserting this expression in (39) together with the expression for the numerator gives the desired result. 2 ^ 1 for a given The design problem is thus to solve (49) for R N (!). E. Special Cases of the General Algorithm The general algorithm (29){(33) has two design variables r ^2 ^1 . In many cases, this gives \too much freedom" to be and R handled in a rational way. Therefore the most commonly used algorithms are special cases, with just one scalar design variable ^1 ). (obtained as special choices of r ^2 and R A very common such special case is to use the gain vector K (t) as given by the recursive least squares (RLS) method, see 12]. Here the parameter estimate is updated according to (29) and the gain vector is P (t ; 1)'(t) K (t) = + ' (65) T (t)P (t ; 1)'(t) and ; 1)'(t)'T (t)P (t ; 1) ] P (t) = 1 P (t ; 1) ; P (t + 'T (t)P (t ; 1)'(t) (66) The variable 0 < 1 is the so called forgetting factor, which is used to control the length of the update step in the algorithm. This special case is obtained by choosing (t)'T (t)P (t ; 1) ] ^ 1 (t) = ( 1 ; 1) P (t ; 1) ; P (t ; 1)' R T (t) (t) + ' (t)P (t ; 1)'(t) (67) r ^2 (t) = (t) see 11]. Another standard choice is the least mean squares (LMS) algorithm. See, for example, 15] for details. The LMS algorithm corresponds to choosing K (t) = '(t) (68) where is a positive scalar. The LMS algorithm is obtained from the general algorithm, see 11], with the choices T ^1 (t) = 2 '(t)' (t) 2 R (69) 1 + j '(t) j r ^2 (t) = 1 P (0) = I

Using results in 11], for small the LMS algorithm approxi- Furthermore, as also discussed in the appendix, under certain mately corresponds to choosing regularity conditions ^1 = 2Q R (70) (76) lim 1 W (!)R;1W (!) = f (1!) n!1 n (with Q de ned by (56)), while the RLS algorithm, for close ^1 in (74): This gives us three di erent possibilities to select R to one can be interpreted as choosing Fix a number of values of ! (typically equally spaced over ^ 1 = (1 ; )2 Q;1 R (71) 0 2 ]) and solve (74) for the matrix elements in the (symmet^1. ric) matrix R ^ 1 into (49) gives the time resolutions Inserting these R Take the inverse Fourier transform of the RHS of (74) and let ^1 be formed as a Toeplitz matrix from this sequence. R 2 Form the inverse Fourier transform of the inverse of the the (72) N RLS (!) = 1 ; ^1 as the inverse of the corresponding RHS of (74). Then form R 2 LMS N (! ) = (! ) (73) Toeplitz matrix. y To examine the rst approach, let In conventional studies of recursive parameter estimation alT = W (!1 ) W (!2) ::: W (!n! )] (77) gorithms the trade o between tracking ability and variance reduction is a key issue. In the above cases this trade o is so that TT = n! In! . Consider now the equation made in terms of the scalars and : A small or a value of close to one, gives long time horizons. This leads to poor time T RT = F = diag(f (!1 ) f (!2 ) ::: f (!n! )) (78) resolution (poor tracking), but also spectral estimates with low variances (the estimate is based on more observations). Time The diagonal elements correspond to the relations in (74) and resolution and reliable estimates (low variance) are thus two the o -diagonal elements turn out to be irrelevant. Then sides of the same coin. The trade-o made is therefore based on information, or assumptions on how fast the signal properties R = n12 TFT (79) ! are changing. Note that the two cases (72) and (73) the time resolution as This is a feasible solution since (74) is satis ed exactly at the a function of frequency comes automatically with the method: points !1 :: !n! when n! = na (in which case also T T = For RLS the time resolution is independent of frequency, while I ). Furthermore, it is a direct consequence of the exponential LMS gives a dependence such that a higher level of the spectrum function that this solution interpolates (74) smoothly between gives a short time horizon. Whether this is desired or not will the grid points. Also, if we choose n! > na and solve (78) depend on the actual signal properties. in a least squares sense, then again the solution gives a nice interpolation of (74). V. Shaping the Time Window in the Kalman Filter We illustrate these three methods with a simple example. We saw in the previous section that we can approximately Example 1: Consider the case (74). Let ^ y (!) 1 and compute the time resolution for the general Kalman based l- N 2 (!) = 1=(( =10)2 + !2 ). This is similar to the basic wavelet ter (29){(34) according to Theorem 1. We also saw how the case, except for the cut-o frequency =10. The cut-o frecommon algorithms RLS and LMS give particular expressions quency is included to avoid f (!) being a pure double di erenfor the time resolution as a function of frequency, (72){(73). In tiation. Let us choose n=5 and 20, respectively, and compute this section we shall turn the question around: Suppose that we a corresponding R ^1 using the three methods. In all methods, insist on a particular choice of time resolution as a function of N (!) is evaluated in 64 regularly spaced points between 0 and frequency N (!), how shall we then design the lter (29){(34) to . We evaluate the design by solving (74) for N (!) at 100 new ensure this choice? frequencies not included in the design step, and the result is The key is of course Theorem 1. If the desired resolution is shown in Figure 1. We see that the inverse Toeplitz method ^1 (t) so that superior at low frequencies to approximate N (!) = 1=pf (!is N (!), then choose the scalar r ^2 and the matrix R ). The result of the least squares solution is not shown in the g1 W (!)R 4 ^ 1 (t)W (!) = (74) ures because it turns out to give identical result as the direct nr ^2 N 2 (!) ^ y (! t) Toeplitz method, a fact which is not obvious from the above. the number of computations to get the least squares Here n is the size of the state-vector (the order of the under- However, solution is much larger. In this case, where the FFT can be lying AR-model). the di erence is a factor 1000. The impulse responses How can we choose a matrix R so that W (!)RW (!) will used, shown to the right in Figure 1 gives an interpretation of why be a speci c function, say f (!)? One possibility would be to the inverse Toeplitz method is better: the impulse response conx a nite number of values of ! and view (74) as a system of verges quicker. equations (one for each frequency) to be solved for in terms of Based on the result of such examples, the inverse Toeplitz the entries of the R-matrix. is proposed and the result summarized in an algorithm. Another possibility is to choose R to be Toeplitz, so that its method Algorithm 1 (Frequency Selective Kalman Filter (FSKF)) k ` element is r(`;k). As discussed in the appendix, the Fourier Choose the AR model order n, and the shape of the time wintransform of the function r that builds up the Toeplitz matrix dow N (!). Run the basic lter (29){(34) with the following R can be expresses as ^ 1 (t) choice of R 1 X 1 W (!)RW (!) (75) 1. Form the function f (! ) = r( )ei! = nlim !1 n f (!) = 4=(N 2 (!) ^ y (! t)) (80) =;1

Approximation ability of the time window 20 18 16 14 12 N(w) 10 8 Specified time window Direct Toeplitz method Inverse Toeplitz method

Inverse transforms of f(w) and 1/f(w) 1 IDFT of f(w) IDFT of 1/f(w)

0.8

0.6

0.4

0.2

0
6 4 2 0 0

0.2

0.4
0.5 1 1.5 w [rad] Approximation ability of the time window 2 2.5 3 3.5 0.6 1

10

20 18 16 14 12 N(w) 10 8 Specified time window Direct Toeplitz method Inverse Toeplitz method

Inverse transforms of f(w) and 1/f(w) 1 IDFT of f(w) IDFT of 1/f(w)

0.8

0.6

0.4

0.2

0
6 4 2 0 0

0.2

0.4
0.5 1 1.5 w [rad] 2 2.5 3 3.5 0.6 0

10

15

20

25

30

35

40

Fig. 1. Approximation ability of the three methods for model orders 5 (upper plots) and 20 (lower plots).

using (34) 2. Compute the inverse Fourier transform of f ;1 (!) evaluated at a regularly spaced frequency grid using IFFT. 3. Let P be the n n Toeplitz matrix built up from this inverse Fourier transform. ^1 (t) = P ;1 . 4. Let R Finally, use the scalar r ^2 in the lter to tune the scale of the time window. With these techniques we have constructed an alternative algorithm to compute time-varying properties of a signal. Like the basic RLS and LMS methods it has only one scalar design variable that controls the basic trade-o between tracking ability (time resolution) and variance of the estimates. In this case it is the scalar r ^2 .
VI. An alternative motivation

Therefore, the state noise s! (t) should be proportional to the spectrum, s! (t) = ! (t)s! (t). A logarithmic state transformation is now very natural: log( ! (t + 1)) = log( ! (t)) + log(1 + s! (t)) log( ! (t)) + s! (t) (82)

The approximation is valid if s! (t) is small (which it must be for a spectrum to be a logical measure of frequency content, see pthe discussion in Section II). Now it is possible to say that if S! is twice as large at one frequency than at another, then spectral changes at the rst frequency can be expected to be twice as large. The designed random walk covariance should then be made proportional to the speci ed squared time resolution

(83) The methods presented in Section V all hinge on the formula ^ 1 (t). A Taylor (74). This formula will now be motivated in a completely dif- Consider now the relation between S! and R ferent way. expansion of (26) and (27) at = (t) gives Introduce the simpli ed notation ! (t) = y (! t). Consider a random walk model of the spectrum, log( ! (t)) = log(^ r2 ) ; log(1 + W (!)T (t)) ; log(1 + W (!) (t)) (81) ! (t + 1) = ! (t) + s! (t): (!)(1 + W (!) (t))) (t) log( b ! (t)) ; 2 real (W Let S! = Cov(s! (t)) be the covariance matrix of the random j1 + W (!) (t)j2 walk. A large S! implies fast variations in the spectrum at freW (!) = log( b ! (t)) + ei (t !) j1 + 2 quency !. However, the size of S! is scale dependent. It would W (!k ) (t)j (t) be much more logical to let S! be proportional to the spectrum 4 log( b ! (t)) + ! (t) (t) = (84) itself, and trying to capture relative changes in the spectrum.

S! = Cov(s! (t)) = cN ;2 (!):

for some phase argument (t !). Now S! = Cov(log( ! (t + 1) ; log( ! (t)))) (85) = Cov ( ! (t + 1) (t + 1) ; ! (t) (t)) (86) In the end, ! (t) has to be approximated by plugging in the current parameter estimate, so assume ! (t +1) ! (t). Then S! Cov ( ! (t)( (t + 1) ; (t))) ^ 1 ! (t) = ! (t)R ^ = 4 j1 + W (1 !) (t)j2 W (!) R1 W (!) 4^ ^ =r (87) ^2 y (! t)W (!) R1 W (!) Combining (83) and (87) gives 1 W (! )R c ^ 1 (t)W (!) = (88) nr ^2 4nN 2 (!) ^ y (! t) which is equivalent to (74) if the scaling factor is chosen as c = 16n. That is, for a given model structure AR(n), these two approaches coincide, both having one scalar as a design parameter. Thus, we have con rmed that (74) is a logical base for the design.
VII. Numerical Illustrations

Approximation ability of the time window 14

12

Specified time window Direct Toeplitz method Inverse Toeplitz method

10

Least squares method

8 N(w) 6 4 2 0 0

0.5

1.5 w [rad]

2.5

Approximation ability of the time window 14

12

Specified time window Direct Toeplitz method Inverse Toeplitz method

10

Least squares method

8 N(w) 6 4 2 0 0

A. Achieved frequency resolution An open question from Example 1 is what happens if the true spectrum is replaced by its estimate, as suggested in Algorithm 1. For this purpose, an AR(n) model is estimated from white noise of length N = 500. That is, Fig. 2. Approximation ability of speci ed time resolution of the three methods for model orders 5 (upper plot) and 20 (lower plot). opt = (1 0 :::0)T : ^1 is computed from the three methods described in SecThen R recursive parameter estimate is in all cases averaged over tion V and the frequency resolution N (!) evaluated using equa- The 50 realizations. Note the following: tion (74). Figure 2 shows the result averaged from the last RLS has the overall best performance, a fact stemming from ^ (! t) for n = 5 and n = 20, respectively. N ; 5n values of N the abrupt parameter changes where RLS is in a sense the best The left plot shows the accuracy of Algorithm 1. That is, how linear tracking algorithm. ^ 1 . The accuracy is very well can equation (74) be solved for R The tracking ability of RLS and KF is independent for the good for the higher model order except for very low frequencies. two frequencies and the variance error seems to be the same. The tracking ability of FSKF is much better for the higher frequency than for the lower one at the cost of increased variB. Frequency tracking ance. In this subsection a piecewise constant fourth order AR model ^ 1 depends on the We stress again the the optimal choice of R is studied. The three di erent parameter vectors correspond to signal and nothing else, but with the advocated choice we get a the spectra in Figure 3. That is, rst the AR coe cients are prede ned frequency resolution. chosen to give two resonance peaks. Then they are changed such that the peak at the lower frequency (0.6) is attenuated VIII. Conclusions and then changed back again. Then, in the same way, the high In this contribution, we have focused on the time and frefrequency (1.2) peak is attenuated and then changed back again. quency resolution of several parametric methods for spectral The tracking ability of the di erent methods will be illus- estimation, the terminology used in the non-parametric trated at the two marked frequencies (0.6 and 1.2) in Figure context. In with the parametric approach, we computed the spec3. An AR(4) model is estimated rst with RLS with forgetting trum from a recursively estimated AR model. It was shown ^ factor 0.98 and then with a Kalman lter with R1 = 0:0001 I , that the time windows { that is, the e ective number of samand the spectrum at these two frequencies is evaluated at each ples used to compute the spectrum at a certain frequency { for time instant using the recursively estimated parameters. The common adaptive methods as LMS and RLS are inherently freresult is shown in Figure 4 for RLS and Figure 5 for the Kalman quency independent. The time resolution depends only on the lter. Figure 6 shows the same thing for the frequency selective design parameters and the spectrum itself. Kalman lter in Algorithm 1, with 1=r ^2 = 0:001 and We have shown how the general Kalman lter formulation allows us to shape the time resolution arbitrarily with frequency. 1: N (!) = ! (89) It is just a matter of using the \state noise covariance matrix"
0.5 1 1.5 w [rad] 2 2.5 3

Spectra for three AR(4) models 100 90 80 70 60 50

10

Algorithm kf. Design parameter 0.0001. Low frequency.

10

10

200

400

600

800

1000

1200

1400

1600

1800

2000

10

Algorithm kf. Design parameter 0.0001. High frequency.

40 30 20 10 0 0
10
0

10

0.5

1.5

2.5

200

400

600

800

1000

1200

1400

1600

1800

2000

Fig. 3. Spectra for three AR(4) models


10
2

Fig. 5. The spectral estimate ^ y (! t) at two frequencies (upper plot:low, lower plot: high) using the Kalman lter
10
2

Algorithm ff. Design parameter 0.98. Low frequency. Algorithm f3. Design parameter 0.001. Low frequency.

10

10

10

200

400

600

800

1000

1200

1400

1600

1800

2000

10

200

400

600

800

1000

1200

1400

1600

1800

2000

10

Algorithm ff. Design parameter 0.98. High frequency. 10


2

Algorithm f3. Design parameter 0.001. High frequency.

10

10

10

200

400

600

800

1000

1200

1400

1600

1800

2000

10

Fig. 4. The spectral estimate ^ y (! t) at two frequencies (upper plot:low, lower plot: high) using RLS

200

400

600

800

1000

1200

1400

1600

1800

2000

Fig. 6. The spectral estimate ^ y (! t) at two frequencies (upper plot:low, lower plot: high) using frequency selective KF

^1 as a design variable. R

This could, e.g. be used so that the time resolution increases where with higher frequencies, similar to the wavelet transform. Thus, ^1 which is not the proposed method o ers a default choice of R ad hoc, in contrast to the RLS and LMS algorithms.
Appendix I. Some useful matrix results

a(!) =

1 X
=;1

a e;i!

(92)

In particular this means that if A is a covariance matrix of a signal the limit tends to the spectrum of the signal. Furthermore we have that, see 13], In this section we will summarize some matrix results that are essential for the derivations in the paper. The rst result lim 1 W (!)A;1 W (!) = a(1 (93) states that given a matrix A with Toeplitz structure, i.e. n!1 n !) 0 a0 a;1 a;2 a;3 : : : a;n 1 which means that the limit operation applied to the inverse of a1 a0 a;1 a;2 : : : a;n+1 C B a covariance matrix results in the inverse of the spectrum. B .. . . . . .. C B C In 5] corresponding results are derived for a class of matrices . . . . B C that attain Toeplitz structure when their dimensions increase. B C A = B .. .. C ... ... . . C (90) The matrices in this class also have the property that the eleB B ments decay exponentially, i.e. .. C ... ... @ ... . A an : : : a;3 a;2 a;1 a0 ja j C (94) we have the standard result for 0 < < 1 and some constant C . For matrices in this class it is shown that n ; 1 1 W (!)AW (!) = X (1 ; j j )a e;i ! ! a(!) as n ! 1 n n (95) lim 1 W (!)ABW (!) ! a(!)b(!) (91) =;n+1 n!1 n

where as before

a(!) =
and

1 X
=;1

a e;i! b e;i!

(96)

b(!) =

1 X
=;1

(97)

1] I. Daubechies. Ten Lectures on Wavelets. SIAM, Philadelphia, 1992. 2] P. Flandrin. \A Time-Frequency Formulation of Optimum Detection". IEEE Trans. Acoustics, Speech and Signal Processing, 36:1377{1384, 1988. 3] W.A. Gardner. Statistical Spectral Analysis. Prentice Hall, 1988. 4] S. Gunnarsson. \Frequency domain accuracy of recursively identi ed ARX models". International Journal Control, 54:465{480, 1991. 5] S. Gunnarsson and L. Ljung. \Frequency domain tracking characteristics of adaptive algorithms". IEEE Transactions on Acoustics, Speech and Signal Processing, 37:1072{1089, 1989. 6] L. Guo and L. Ljung. \Performance Analysis of General Tracking Algorithms". IEEE Trans. Automatic Control, AC-40:1388{1402, 1995. 7] F. Hlawatsch and G.F. Boudreaux-Bartels. Linear and quadratic timefrequency signal representation. IEEE Signal Processing Magazine, 9(2):21{ 68, 1992. 8] G.M. Jenkins and D.G. Watts. Spectral Analysis and Its Applications. Holden-Day, 1968. 9] S.M. Kay. Modern Spectral Estimation. Prentice Hall, 1988. 10] G. Kitagawa and W. Gersh. \A smoothness priors time-varying AR coe cient modeling of nonstationary covariance time series". IEEE Trans. Automatic Control, 30:48{65, 1985. 11] L. Ljung and S. Gunnarsson. \Adaptation and tracking in system identication { A survey". Automatica, 26:7{21, 1990. 12] L. Ljung and T. Soderstrom. Theory and Practice of Recursive Identi cation. M.I.T. Press, Cambridge, MA., 1983. 13] L. Ljung and Z.D. Yuan. \Asymptotic properties of black-box identi cation of transfer functions". IEEE Trans. Automatic Control, AC-30:514{530, 1985. 14] O. Rioul and M. Vetterli. Wavelets and signal processing. IEEE Signal Procesing Magazine, 8(4):14{38, 1991. 15] B. Widrow and S.D. Stearns. Adaptive Signal Processing. Prentice-Hall, Englewood Cli s, N.J., 1985.

References

Fredrik Gustafsson was born in 1964. He received the M.S. degree in electrical engineering in 1988 and the Ph.D. degree in automatic control in 1992, both from Linkoping University, Sweden. He is an associate professor in electrical engineering at Linkoping University. His research is focused on statistical methods in system identi cation and signal processing.

Svante Gunnarsson was born in 1959. He received his M.Sc. in 1983 and his Ph.D. in 1988, both from Linkoping University. Since 1989 he is associate professor at the department of electrical engineering at Linkoping university. His research interests are system identi cation and adaptive control.

Lennart Ljung was born in Malmo, Sweden. He received the Ph.D. degree in 1974 from Lund University in Sweden. Since 1976 he has been Professor of the Chair of Automatic Control at Linkoping University, Sweden. Professor Ljung is an associate editor of several journals. He is fellow of the IEEE, member of the Royal Swedish Academy of Engineering Sciences and the Royal Swedish Academy of Science, and also an IFAC Advisor.

You might also like