Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Nonlinear system identification

System identification is a method of identifying or measuring the mathematical model of


a system from measurements of the system inputs and outputs. The applications of system
identification include any system where the inputs and outputs can be measured and
include industrial processes, control systems, economic data, biology and the life
sciences, medicine, social systems and many more.

A nonlinear system is defined as any system that is not linear, that is any system that does not
satisfy the superposition principle. This negative definition tends to obscure that there are very
many different types of nonlinear systems. Historically, system identification for nonlinear
systems[1][2] has developed by focusing on specific classes of system and can be broadly categorized
into five basic approaches, each defined by a model class:

1. Volterra series models,

2. Block-structured models,

3. Neural network models,

4. NARMAX models, and

5. State-space models.

There are four steps to be followed for system identification: data gathering, model postulate,
parameter identification, and model validation. Data gathering is considered as the first and
essential part in identification terminology, used as the input for the model which is prepared later.
It consists of selecting an appropriate data set, pre-processing and processing. It involves the
implementation of the known algorithms together with the transcription of flight tapes, data
storage and data management, calibration, processing, analysis, and presentation. Moreover,
model validation is necessary to gain confidence in, or reject, a particular model. In particular, the
parameter estimation and the model validation are integral parts of the system identification.
Validation refers to the process of confirming the conceptual model and demonstrating an
adequate correspondence between the computational results of the model and the actual data. [3]

Volterra series methods[edit]

The early work was dominated by methods based on the Volterra series, which in the discrete time
case can be expressed as

where u(k), y(k); k = 1, 2, 3, ... are the measured input and output respectively and is the lth-order
Volterra kernel, or lth-order nonlinear impulse response. The Volterra series is an extension of the
linear convolution integral. Most of the earlier identification algorithms assumed that just the first
two, linear and quadratic, Volterra kernels are present and used special inputs such as Gaussian
white noise and correlation methods to identify the two Volterra kernels. In most of these
methods the input has to be Gaussian and white which is a severe restriction for many real
processes. These results were later extended to include the first three Volterra kernels, to allow
different inputs, and other related developments including the Wiener series. A very important
body of work was developed by Wiener, Lee, Bose and colleagues at MIT from the 1940s to the
1960s including the famous Lee and Schetzen method.[4][5] While these methods are still actively
studied today there are several basic restrictions. These include the necessity of knowing the
number of Volterra series terms a priori, the use of special inputs, and the large number of
estimates that have to be identified. For example, for a system where the first order Volterra kernel
is described by say 30 samples, 30x30 points will be required for the second order kernel,
30x30x30 for the third order and so on and hence the amount of data required to provide good
estimates becomes excessively large.[6] These numbers can be reduced by exploiting certain
symmetries but the requirements are still excessive irrespective of what algorithm is used for the
identification.

Block-structured systems[edit]

Because of the problems of identifying Volterra models other model forms were investigated as a
basis for system identification for nonlinear systems. Various forms of block structured nonlinear
models have been introduced or re-introduced.[6][7] The Hammerstein model consists of a static
single valued nonlinear element followed by a linear dynamic element.[8] The Wiener model is the
reverse of this combination so that the linear element occurs before the static nonlinear
characteristic.[9] The Wiener-Hammerstein model consists of a static nonlinear element sandwiched
between two dynamic linear elements, and several other model forms are available. The
Hammerstein-Wiener model consists of a linear dynamic block sandwiched between two static
nonlinear blocks.[10] The Urysohn model [11][12] is different from other block models, it does not
consists of sequence linear and nonlinear blocks, but describes both dynamic and static
nonlinearities in the expression of the kernel of an operator.[13] All these models can be
represented by a Volterra series but in this case the Volterra kernels take on a special form in each
case. Identification consists of correlation based and parameter estimation methods. The
correlation methods exploit certain properties of these systems, which means that if specific inputs
are used, often white Gaussian noise, the individual elements can be identified one at a time. This
results in manageable data requirements and the individual blocks can sometimes be related to
components in the system under study.

More recent results are based on parameter estimation and neural network based solutions. Many
results have been introduced and these systems continue to be studied in depth. One problem is
that these methods are only applicable to a very special form of model in each case and usually
this model form has to be known prior to identification.

Neural networks[edit]

Artificial neural networks try loosely to imitate the network of neurons in the brain where
computation takes place through a large number of simple processing elements. A typical neural
network consists of a number of simple processing units interconnected to form a complex
network. Layers of such units are arranged so that data is entered at the input layer and passes
through either one or several intermediate layers before reaching the output layer. In supervised
learning the network is trained by operating on the difference between the actual output and the
desired output of the network, the prediction error, to change the connection strengths between
the nodes. By iterating, the weights are modified until the output error reaches an acceptable
level. This process is called machine learning because the network adjusts the weights so that the
output pattern is reproduced. Neural networks have been extensively studied and there are many
excellent textbooks devoted to this topic in general,[1][14] and more focused textbooks which
emphasise control and systems applications,.[1][15] There are two main problem types that can be
studied using neural networks: static problems, and dynamic problems. Static problems
include pattern recognition, classification, and approximation. Dynamic problems involve lagged
variables and are more appropriate for system identification and related applications. Depending
on the architecture of the network the training problem can be either nonlinear-in-the-parameters
which involves optimisation or linear-in-the-parameters which can be solved using classical
approaches. The training algorithms can be categorised into supervised, unsupervised, or
reinforcement learning. Neural networks have excellent approximation properties but these are
usually based on standard function approximation results using for example
the Weierstrass Theorem that applies equally well to polynomials, rational functions, and other
well-known models. Neural networks have been applied extensively to system identification
problems which involve nonlinear and dynamic relationships. However, classical neural networks
are purely gross static approximating machines. There is no dynamics within the network. Hence
when fitting dynamic models all the dynamics arise by allocating lagged inputs and outputs to the
input layer of the network. The training procedure then produces the best static approximation
that relates the lagged variables assigned to the input nodes to the output. There are more
complex network architectures, including recurrent networks,[1] that produce dynamics by
introducing increasing orders of lagged variables to the input nodes. But in these cases it is very
easy to over specify the lags and this can lead to over fitting and poor generalisation properties.
Neural networks have several advantages; they are conceptually simple, easy to train and to use,
have excellent approximation properties, the concept of local and parallel processing is important
and this provides integrity and fault tolerant behaviour. The biggest criticism of the classical neural
network models is that the models produced are completely opaque and usually cannot be written
down or analysed. It is therefore very difficult to know what is causing what, to analyse the model,
or to compute dynamic characteristics from the model. Some of these points will not be relevant
to all applications but they are for dynamic modelling.

NARMAX methods[edit]

The nonlinear autoregressive moving average model with exogenous inputs (NARMAX model) can
represent a wide class of nonlinear systems,[2] and is defined as

where y(k), u(k) and e(k) are the system output, input, and noise sequences respectively; , ,
and are the maximum lags for the system output, input and noise; F[•] is some nonlinear function,
d is a time delay typically set to d = 1.The model is essentially an expansion of past inputs, outputs
and noise terms. Because the noise is modelled explicitly, unbiased estimates of the system model
can be obtained in the presence of unobserved highly correlated and nonlinear noise. The Volterra,
the block structured models and many neural network architectures can all be considered as
subsets of the NARMAX model. Since NARMAX was introduced, by proving what class of nonlinear
systems can be represented by this model, many results and algorithms have been derived based
around this description. Most of the early work was based on polynomial expansions of the
NARMAX model. These are still the most popular methods today but other more complex forms
based on wavelets and other expansions have been introduced to represent severely nonlinear
and highly complex nonlinear systems. A significant proportion of nonlinear systems can be
represented by a NARMAX model including systems with exotic behaviours such
as chaos, bifurcations, and subharmonics. While NARMAX started as the name of a model it has
now developed into a philosophy of nonlinear system identification,.[2] The NARMAX approach
consists of several steps:

 Structure detection: which terms are in the model

 Parameter estimation: determine the model coefficients

 Model validation: is the model unbiased and correct

 Prediction: what is the output at some future time

 Analysis: what are the dynamical properties of the system

Structure detection forms the most fundamental part of NARMAX. For example, a NARMAX model
which consists of one lagged input and one lagged output term, three lagged noise terms,
expanded as a cubic polynomial would consist of eighty two possible candidate terms. This number
of candidate terms arises because the expansion by definition includes all possible combinations
within the cubic expansion. Naively proceeding to estimate a model which includes all these terms
and then pruning will cause numerical and computational problems and should always be avoided.
However, only a few terms are often important in the model. Structure detection, which aims to
select terms one at a time, is therefore critically important. These objectives can easily be achieved
by using the Orthogonal Least Squares [2] algorithm and its derivatives to select the NARMAX model
terms one at a time. These ideas can also be adapted for pattern recognition and feature
selection and provide an alternative to principal component analysis but with the advantage that
the features are revealed as basis functions that are easily related back to the original problem.

NARMAX methods are designed to do more than find the best approximating model. System
identification can be divided into two aims. The first involves approximation where the key aim is
to develop a model that approximates the data set such that good predictions can be made. There
are many applications where this approach is appropriate, for example in time series prediction of
the weather, stock prices, speech, target tracking, pattern classification etc. In such applications
the form of the model is not that important. The objective is to find an approximation scheme
which produces the minimum prediction errors. A second objective of system identification, which
includes the first objective as a subset, involves much more than just finding a model to achieve
the best mean squared errors. This second aim is why the NARMAX philosophy was developed and
is linked to the idea of finding the simplest model structure. The aim here is to develop models
that reproduce the dynamic characteristics of the underlying system, to find the simplest possible
model, and if possible to relate this to components and behaviours of the system under study. The
core aim of this second approach to identification is therefore to identify and reveal the rule that
represents the system. These objectives are relevant to model simulation and control systems
design, but increasingly to applications in medicine, neuro science, and the life sciences. Here the
aim is to identify models, often nonlinear, that can be used to understand the basic mechanisms of
how these systems operate and behave so that we can manipulate and utilise these. NARMAX
methods have also been developed in the frequency and spatio-temporal domains.

You might also like