Black Box System Identification

Lecture 11
Black Box System Identification:

Time Domain
11.1
Overview
x[n]
y[n]
The objective of system identification is to model the input-output behavior of a system

T as well as possible. In this class, we cover Black Box system identification, which means
that we do not have a firstprinciples model of the system. Black Box system identification
is a purely data-driven modeling tool. It can be compared with
Grey Box : We have a first principles model, but with unknown parameters. Data is
used to identify these parameters. This is covered in Prof. Guzzellas lecture System
Modeling.
White Box : We have a first principles model, with no unknown parameters. No data
is required.
This categorization is a simplified view: In reality, combinations of these system identification methods are usually applied. Black Box methods are useful for:
regulation tasks near equilibrium.
improving already deployed systems.
as a first step, to learn more about the system.
There are Black Box system identification techniques for both time domain and frequency
domain. In this lecture, we introduce time domain techniques. Frequency domain methods
will be introduced in the following lecture.
11.2
ARX Models
The simplest form of time-domain system identification uses a model named Autoregressive
with exogenous input (ARX), which has the following structure:
N
X
k=0
ak y[n k] =
| {z }
output
M
X
k=0
bk x[n k] + e[n] ,
| {z } |{z}
input
(11.1)
error
where we choose a0 = 1 to ensure that the resulting system is causal. We introduce the
vector of unknown variables
T

= a1 a2 . . . aN b0 b1 . . . bM ,
which has size N + M + 1 and where T denotes the transpose. The user-defined design
variables in the model are N and M.
The transfer function from X(z) to Y (z) is
H(z) =
b + b1 z 1 + + bM z M
Y (z)
B(z)
= 0
.
=:
X(z)
A(z)
1 + a1 z 1 + + aN z N
Including the error, we obtain

Y (z) =
B(z)
1
X(z) +
E(z).
A(z)
A(z)
(11.2)
q-Notation
We introduce the q-notation, which is frequently used in system identification literature
and software. The operator q is a time shift operator
q 1 x[n] = x[n 1].
We may rewrite equation (11.2) using q as
y[n] =
B(q)
1
x[n] +
e[n].
A(q)
A(q)
d
and s in continuous
The relationship of q and z is similar to the relationship between
dt
time: they both represent the same thing.
y[n] = H(q)x[n]
Y (z) = H(z)X(z).
However, q acts directly on time signals and we avoid defining transformations.

2
ARX Identification
We start with the system at rest. An input signal x[n] is fed into the system, and the
output y[n] is observed for the interval 0 n K. Writing the ARX model equation
(11.1) out for this interval, we obtain
y[0] = b0 x[0] + e[0]
y[1] = b0 x[1] + b1 x[0] a1 y[0] + e[1]
..
.
y[n] =
M
X
bk x[n k]
N
X
ak y[n k] + e[n]
k=1
k=0
..
.
y[K] =
M
X
bk x[K k]
k=0
N
X
ak y[K k] + e[K]
k=1
All of these equations can be written as a (very large) matrix equation:

D = Y E
0
y[0]
..
.
0
0
..
.
...
...
..
.
0
0
..
.
x[0]
x[1]
..
.
0
x[0]
..
.
...
...
..
.
0
0
..
.
D=
y[n 1] y[n 2] . . . y[n N] x[n] x[n 1] . . . x[n M]
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
y[K 1] y[K 2] . . . y[K N] x[K] x[K 1] . . . x[K M]
y[0]
e[0]
y[1]
e[1]
.
.
.
.
.
.
Y =
E=
y[n]
e[n]
.
.
..
..
y[K]
e[K]
D: (K + 1) (N + M + 1) matrix, known.
Y : (K + 1) vector, known.
: (N + M + 1) vector, unknown.
E: (K + 1) vector, unknown.
3
When K N + M, D is a very thin matrix, meaning that it has many more rows than it
has columns. We find by solving the following optimization problem:
min(Y D)T (Y D) = minE T E,
(11.3)
which has the interpretation of finding the parameters that minimize the norm of the
error vector. These are the parameters that explain the data with as little error as possible.
The minimization can be solved using least squares:
E T E = Y T Y Y T D T D T Y + T D T D
dE T E
=0
d
2D T Y + 2D T D = 0
= (D T D)1 D T Y,
where is the minimizing parameter vector. D T D is a matrix of size (N + M + 1)
(N + M + 1), which is typically relatively small. The inversion of this matrix is therefore
numerically inexpensive.
It is important to note that D T D needs to be invertible. D only depends on the input/output data, meaning that this data must fulfill certain criteria. The input x[n] must
be rich. This is called persistence of excitation. This concept will become clearer in the
next lecture. An example for a signal that is not rich is
x[n] = 1 n.
ARX Validation
After identifying a model for a system, it is important to validate the model. A common
validation method is using one data set for identification and another data set for validation.
This helps to avoid overfitting: If one chooses N or M too high, the model overfits the data
by learning properties that are specific to the identification input/output data set and do
not generalize to other data sets.
To check the quality of the model, we compare the model output to the true system output
on the validation data set. We compare the predicted output y[n] to the measured output
y[n], given the known applied input x[n]:
B(q)
1
x[n] +
e[n]
A(q)
A(q)
b
B(q)
x[n],
y[n] =
b
A(q)
y[n] =
b
B(q)
where b
is the model that has been identified with the identification data set. If the
A(q)
noise is small, y[n] should be close to y[n]. Furthermore, the inferred error e[n] that explains
4
the discrepancy between y[n] and y[n] should be a white noise signal. If the error signal
contains more power at certain frequencies it is likely that the model does not accurately
capture the system dynamics at these frequencies. To check whether the error signal is
white, we first infer the signal from the measurement data and identified model:
b
B(q)
1
x[n] +
e[n]
b
b
A(q)
A(q)
b y[n] B(q)
b x[n]
e[n] = A(q)
y[n] =
(11.4)
As was discussed in a previous lecture, one can check whether e[n] is white by using the
autocorrelation function:
K
1 X
Ree[k] =
e[n] e[n k],
K +1
n=0
where wrap-around is used (i.e. it is assumed that e[n K] = e[n]).
11.3
Comparison with other models
The model structure of ARX can be sketched as follows:

x[n]
y[n]
B(q)
A(q)
1
A(q)
e[n]
This is not a very natural model. There is no good reason why e[n] should pass through
1 . The motivation for this model is mathematical convenience. There are much better
A(q)
methods, implemented for example in the Matlab System Identification Toolbox.
B(z)
Due to the model structure, ARX performs especially poorly when the poles of H(z) = A(z)
1 , causing large
are close to the unit circle: Small errors e[n] are greatly amplified by A(q)
(undesired) changes in the output y[n]. Systems with fast discretization usually have this
problem: If the underlying continuous time model has a pole at p, this pole gets mapped
to the discrete time pole epTs , where Ts is the sampling period. If Ts is small,
epTs 1 + pTs ,
which means that the pole of H(z) is close to 1.
OE (Output Error)
x[n]
y[n]
B(q)
A(q)
e[n]
This models the main source of errors as white sensor noise.
ARMAX (Autoregressive Moving Average with exogenous input)
x[n]
y[n]
B(q)
A(q)
C(q)
A(q)
e[n]
6
This model is well suited for input disturbances, as the noise input shares plant poles. The
additional term C(q) gives flexibility that ARX does not have:
A(q)y[n] = B(q)x[n] + C(q)e[n]
We can, for example, model input noise by choosing C(q) = B(q):
x[n]
e[n]
y[n]
B(q)
A(q)
BJ (Box-Jenkins)
x[n]
y[n]
B(q)
A(q)
C(q)
D(q)
e[n]
This is the most general model. It is good when sensor noise is not white, as the transfer
function from the error e[n] to the output y[n] is completely independent of the input to
output transfer function.
The identification of OE, ARMAX, and BJ models must be solved iteratively because
they cannot easily be rewritten in a form similar to (11.2). This is computationally more
expensive and more involved.
11.4
Example ARX System Identification
In the following, we show plots of the results of an example ARX system identification.
x[n]
y[n]
H(q)
x[n]
y[n]
b
H(q)
1
b
A(q)
1
A(q)
e[n]
H(q) =
B(q)
A(q)
e[n]
b
B(q)
b
=
H(q)
=
b
A(q)
M
P
bi q i
i=0
N
P
1+
a
i q i
i=1
The goal is to identify the unknown system H(q) by finding the parameters a
i , bi of a
b
system model H(q)
which result in the closest fit of the outputs y[n] and y[n] given the
same input x[n].
Procedure
We use white noise as input signals and generate two independent input signals x1 [n],
x2 [n]. For each input signal, the corresponding output signal
yi [n] = H(q)xi [n] +
1
e [n]
A(q) i
is measured, where the noise signals ei [n] are white. The first data set, {x1 [n],y1 [n]}, is
b
used to identify the model H(q).
The second dataset, {x2 [n], y2 [n]}, is used as a validation
set. We then choose the design parameters N and M and apply the ARX method. Using
b
least squares, we find the unknown ai , bi of the model H.
All of this is done in a Matlab script, which is available on the course website. Because the
true system H(q) is also simulated in Matlab, we know the underlying unknown system.
It has the transfer function
b + b1 q 1 + b2 q 2 + b3 q 3 + b4 q 4 + b5 q 5 + b6 q 6 + b7 q 7
H(q) = 0
.
1 + a1 q 1 + a2 q 2 + a3 q 3 + a4 q 4 + a5 q 5
The actual system parameters ai , bi are available in the script file. The following plots show
the impact that the design parameters N and M have on the time domain and frequency
domain fit of the model system H(q)

to the actual system H(q).
We analyze three different cases of parameter sets:

1. Case 1: M = 7, N = 5: We choose the design parameters matching the actual system.
2. Case 2: M = 28, N = 20: We overestimate the design parameters.
3. Case 3: M = 3, N = 2: We underestimate the design parameters.
Identification
In the first step, we use the identification data set, x1 [n], y1 [n] to find the model parameters
a
i , bi using least squares, i.e. applying the ARX identification procedure. We perform this
identification for each case and compare the time domain data fits in the following plots.
We further show the autocorellation of the inferred noise e[n] (11.4) that explains the
measured output data given the model system, in an ARX sense.
Identification Data
Actual
Case 1: M = 7, N = 5
Case 2: M = 28, N = 20
Case 3: M = 3, N = 2
Output y1 [n], y1 [n]
4
2
0
2
4
6
0
10
20
30
40
50
Index n
60
70
80
90
Fit of the time domain data for the identification data set. Both Case 1 and Case 2 deliver
a good fit to the data. The underestimation of M and N in Case 3 leads to a poor fit.
Identification Data
102
Autocorrelation Re1 e1 [k]
Case 1: M = 7, N = 5
Case 2: M = 28, N = 20
Case 3: M = 3, N = 2
4
3
2
1
0
1
0
10
15
20
25
Lag k
30
35
40
45
50
Autocorrelation of the inferred noise on the identification data set e1 [n]. Note that Re1 e1 [0]
is proportional to E T E, the sum of squares of the error vector, which is what the least
squares procedure minimizes. Therefore, both Case 1 and Case 2 deliver comparable close
fits, with Case 2 performing a bit better. For these two cases, the autocorrelation function
is also close to a scaled unit impulse, i.e. the error signal e1 [n] is indeed close to white
noise. The scale factor should be close to the variance of the introduced noise. In the
Matlab script, we generate the noise e1 [n] from a uniform distribution with variance 0.01.
The first two cases indeed get close to this number. Case 3 delivers the worst fit, as the
assumed model has not enough degrees of freedom to closely match the data. Some of the
system dynamics that the identified model can not reproduce now appear in the error vector.
This results in a harmonic component of the error autocorrelation of Case 3.
10
Validation
After identifying the model parameters ai , bi , we use the second data set, x2 [n], y2 [n], to
check the identified model. Ideally, the identified model now predicts the output of the
system for any new input. This procedure helps avoiding overfitting, as will become clear
in the following plots.
Validation Data
8
Actual
Case 1: M = 7, N = 5
Case 2: M = 28, N = 20
Case 3: M = 3, N = 2
Output y2 [n], y2 [n]
6
4
2
0
2
0
10
20
30
40
50
Index n
60
70
80
90
Output of the identified models for the validation data set. The fits of Case 1 and Case
2 are not as tight as in the identification step anymore, however, they are still acceptable.
Case 3 is still bad.
11
Validation Data
102
Case 1: M = 7, N = 5
Case 2: M = 28, N = 20
Case 3: M = 3, N = 2
Autocorrelation Re2 e2 [k]
8
6
4
2
0
2
0
10
15
20
25
Lag k
30
35
40
45
50
The difference of Case 1 and Case 2 becomes evident analyzing the autocorrelation function
of the inferred error signal e2 [n]: While the error of the fit for Case 2 is smaller compared
to Case 1 in the identification step, we observe a larger error in the validation step. This
is an effect of overfitting: for Case 2, the design parameters M and N are too high.
12
Comparison of Frequency Responses
() | (dB)
|H () |, |H
20
20
40
60
Actual
Case 1: M = 7, N = 5
Case 2: M = 28, N = 20
Case 3: M = 3, N = 2
101
100
Frequency
13

Black Box System Identification

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Black Box System Identification

Uploaded by

Copyright:

Available Formats

Lecture 11

Black Box System Identification:

The objective of system identification is to model the input-output behavior of a system

Including the error, we obtain

However, q acts directly on time signals and we avoid defining transformations.

All of these equations can be written as a (very large) matrix equation:

y[n 1] y[n 2] . . . y[n N] x[n] x[n 1] . . . x[n M]

where wrap-around is used (i.e. it is assumed that e[n K] = e[n]).

Comparison with other models

The model structure of ARX can be sketched as follows:

Example ARX System Identification

domain fit of the model system H(q)

We analyze three different cases of parameter sets:

Output y1 [n], y1 [n]

Autocorrelation Re1 e1 [k]

Output y2 [n], y2 [n]

Autocorrelation Re2 e2 [k]

Comparison of Frequency Responses

You might also like