Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO.

12, DECEMBER 2011 2189

Data-Based Identification and Control of Nonlinear


Systems via Piecewise Affine Approximation
Chow Yin Lai, Cheng Xiang, Member, IEEE, and Tong Heng Lee, Member, IEEE

Abstract— The piecewise affine (PWA) model represents an the well-known linear control synthesis methods. Then, based
attractive model structure for approximating nonlinear systems. on the operating region of the nonlinear system, the controllers
In this paper, a procedure for obtaining the PWA autoregressive could switch from one to another.
exogenous (ARX) (autoregressive systems with exogenous inputs)
models of nonlinear systems is proposed. Two key parameters The approach for approximating a nonlinear system via a
defining a PWARX model, namely, the parameters of locally PWARX model is as follows. Given a set of input–output
affine subsystems and the partition of the regressor space, are time-series data of a nonlinear system, the data is fitted using
estimated, the former through a least-squares-based identification a family of affine models. The main difficulty is that the
method using multiple models, and the latter using standard identification problem is coupled with a data classification
procedures such as neural network classifier or support vector
machine classifier. Having obtained the PWARX model of the problem, wherein each data point needs to be associated with
nonlinear system, a controller is then derived to control the the most suitable submodel [3]. Therefore, the big challenge
system for reference tracking. Both simulation and experimental in the identification of PWARX models is that both the
studies show that the proposed algorithm can indeed provide parameters of the affine subsystems and the partitions of the
accurate PWA approximation of nonlinear systems, and the regressor domain have to be estimated [3].
designed controller provides good tracking performance.
Some methods to identify PWA and PWARX models have
Index Terms— Nonlinear systems, piecewise affine models, been proposed in the literature [7]– [12]. In [7] and [12], clus-
reference tracking, switching systems, system identification, tering techniques are proposed to collect the input–output data
weighted least squares.
into different groups, and then estimate the parameter vector
for each cluster. However, in the cases where the model orders
I. I NTRODUCTION are not known exactly, the identification results become very

P IECEWISE affine (PWA) systems are systems whose


state-input domain is partitioned into a finite number of
non-overlapping regions, with each individual subsystem in
poor, since the distances in the feature space become corrupted
by irrelevant information. The algebraic-geometric approach in
[8] transforms the multiple ARX models into a single “lifted”
the different regions being linear or affine [1]–[3]. If the sub- ARX model that does not depend on the switching sequence.
system in each region has an autoregressive exogenous (ARX) The parameters of the “lifted” ARX model is identified using
(AutoRegressive systems with eXogenous inputs) input–output standard linear identification techniques, and then the parame-
relationship, then the system is called a piecewise affine ters of the original ARX subsystems are recovered. While this
ARX (PWARX) system [2], [3]. PWA systems have received approach provides a closed-form solution to the identification
much attention from researchers because they are equivalent in the absence of noise, it has been observed that the algorithm
to several classes of hybrid models [4] and thus can be used is rather sensitive to noise or nonlinear disturbances [2]. In
to obtain hybrid models from data, whose typical examples [9], the identification of two subclasses of PWA models is
include manufacturing systems, telecommunication networks, formulated as mixed-integer linear or quadratic problems,
traffic control systems, digital circuits, and logistic systems [5]. which are then solved using available optimization algorithms.
Another distinct advantage of PWA models is that they form The cost functions are guaranteed to converge to the global
an attractive model structure which can approximate nonlinear optimum, but the worst case complexity is very high and the
dynamical systems by switching among various linear/affine procedure is therefore only suitable for cases where relatively
models [1], [3], [6]. Consequently, they are useful for the few data are available. In [10], a Bayesian procedure was
controller design of nonlinear systems—linear controllers for derived to estimate the parameter vectors, which are treated
the affine subsystems can be first designed according to any of as random variables and described through their probability
density functions. The Bayesian procedure works well if
Manuscript received January 14, 2011; revised November 1, 2011; accepted
November 1, 2011. Date of publication November 30, 2011; date of current sufficient physical insight into the underlying data-generating
version December 13, 2011. process is available, but poor initialization may lead to poor
C. Y. Lai is with the National University of Singapore Graduate School for identification results. The bounded-error procedure in [11] fits
Integrative Sciences and Engineering, 117456, Singapore. He is also with the
Singapore Institute of Manufacturing Technology, Agency for Science, Tech- a PWARX model satisfying |e(t)| = |yt +1 − f (rt )| ≤ δ,
nology and Research, 638075, Singapore (e-mail: g0601819@nus.edu.sg). without any assumption on the system generating the data.
C. Xiang and T. H. Lee are with the Department of Electrical and Computer This procedure is well suited for the cases when there is no
Engineering, National University of Singapore, 119260, Singapore (e-mail:
elexc@nus.edu.sg; eleleeth@nus.edu.sg). a priori knowledge on the system, and when one wishes to
Digital Object Identifier 10.1109/TNN.2011.2175946 identify a model with prescribed bounded prediction error.
1045–9227/$26.00 © 2011 IEEE
2190 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011

However, it may be difficult to find the right combination of where the output y(t) ∈ R, the states ξ(t) ∈ R n , and the input
tuning parameters. The readers are also referred to [2] and u(t) ∈ R are discrete time sequences. Assume that system (1)
[3] which provide a summary and comparison of some of the can also be described by the following nonlinear autoregressive
procedures mentioned above. moving average (NARMA) model:
By far, the most popular control methodology for PWA
systems, switching systems, and hybrid systems is the model y(t + 1) = F(y(t), . . . , y(t − n a ), u(t), . . . , u(t − n b )). (2)
predictive control [5], [13]–[24]. As in the conventional model The reader may refer to [28] for a detailed discussion on the
predictive controller for a single model system, the control conditions of the existence of such global input–output models
signal is computed by minimizing a cost function that penal- for nonlinear systems.
izes the future output error and the control signal. However, Given a set of input–output data of the above nonlinear
an additional point that needs to be taken into consideration is system, our aim is to approximate F(·) via a PWARX model
the switching of the model’s subsystems and the corresponding with N partitions such that
local controllers. By viewing the activation and deactivation ⎧ T
of the local controllers as a discrete state of 1 or zero, some ⎪
⎪ ϕ (t)θ1∗ if x(t) ∈ X 1

⎪ . ..

researchers have proposed to include the discrete state in the ⎨ ..
⎪ .
cost function, and solve a mixed-integer linear program or y(t + 1) ≈ ϕ T (t)θm∗ if x(t) ∈ X m∈(1,...,N) (3)
mixed-integer quadratic program [13]. Due to the complexity ⎪
⎪ . .
⎪.
⎪ ..
of mixed-integer programming, these controllers might have ⎪
⎪ .
⎩ T ∗
difficulties to be implemented in real time. As such, [14], ϕ (t)θ N if x(t) ∈ X N
[15], [21] proposed to recast the optimization problem as mul- where
tiparametric mixed-integer programming, which is first solved
offline, and the optimal control signal is obtained as an explicit x(t) = [y(t), . . . , y(t − n a ), u(t), . . . , u(t − n b )]T (4)
 T
function of the states. Then, in real-time implementation, the
ϕ(t) = x T (t), 1 (5)
optimal control signal is obtained merely via a simple function
evaluation. Unfortunately, the method does not lend itself too and
well for general tracking purposes [25]. This could explain N
∪i=1 X i = X and X i ∩ X j = ∅, ∀i
= j (6)
why all of the simulation examples given in [14], [15] and
[21] consider set-point regulation only. where X denotes the whole regressor space. Note that, to
In this paper, a simple yet reliable procedure for obtaining obtain the complete representation of the PWARX model,
the PWARX models of nonlinear systems is proposed. The we will need to identify the parameters of each subsystem
two key parameters defining a PWARX model, namely, the θ1∗ , . . . , θ N∗ as well as to estimate the regions X 1 , . . . , X N .
parameters of locally affine subsystems and the partition of Next, based on the identified PWARX model, we shall
the regressor space, are estimated, the former through a least- design a controller to control the nonlinear system.
squares-based identification method using multiple models, As mentioned in the introduction, the identification of PWA
and the latter using standard procedures such as a neural models is very challenging, because the parameter identifica-
network classifier or a support vector machine classifier. tion is coupled with a data classification problem, wherein
Next, based on the identified PWARX model, a one-step- each data point needs to be associated with the most suitable
ahead predictive controller is designed to control the nonlinear submodel. If the partitions of the regressor space are known
system for tracking. Simulation and experimental studies show a priori, then the problem can be easily solved, as the data
that the proposed algorithm can indeed provide accurate PWA can be easily separated into different groups (submodels), and
approximation of nonlinear systems, and that the designed the parameters of each individual submodel can be identified
controller provides good tracking performance. This paper is easily using standard linear identification methods. However,
built upon our preliminary study on the identification and if the partitions are unknown, the identification of the PWA
control of time-varying and nonlinear systems [26], [27]. systems is a formidable task. In this paper, a method to identify
The rest of this paper is organized as follows. In Section II, the PWA systems with unknown partitions is proposed, as will
the problem formulation is given. The two steps of the be described in the next two sections.
proposed identification algorithm of a PWARX model are then
detailed in Sections III and IV, followed by the derivation III. F IRST S TEP OF I DENTIFICATION P ROCEDURE :
of the control law in Section V. Simulation and experimental PARAMETER I DENTIFICATION
results are shown in Sections VI and VII, respectively. Finally, In the first step of our identification algorithm, the para-
in Section VIII, conclusions are drawn. meters of the subsystems are identified. The partitions of the
regressor domain are estimated only in the second step.
II. P ROBLEM F ORMULATION
Consider a nonlinear system A. Defining the Cost Functions
Assume, for the moment, that the considered system is
ξ(t + 1) = f (ξ(t), u(t)) truly PWA and that there is no measurement noise, i.e., the
y(t) = g(ξ(t)) (1) approximation sign “≈” in (3) is replaced by the equality
LAI et al.: DATA-BASED IDENTIFICATION AND CONTROL OF NONLINEAR SYSTEMS 2191

sign “=.” The first question is how to construct a cost where


function such that the true model parameter vectors θ1∗ , . . . , θ N∗
N

would be the solutions for the global minimum (zero). Among wm


2
(i ) = e2j (i ). (11)
many possible candidates for the cost function to satisfy this j =1
j
=m
requirement, two effective cost functions were introduced in
our earlier work [26], [27]: 1) the geometrical mean of the Similarly, the cost function (8) can be transformed into
squares of errors
1 2
T
1 2 Jh (θ1 , . . . , θm , . . . , θ N ) = wm (i )em (i )
T 2
(12)
2
Jg (θ1 , . . . , θm , . . . , θ N ) = e1 (i ) . . . em (i ) . . . e2N (i ) (7) 2
i=1
2
i=1
by fixing
and 2) the harmonic mean of the squares of errors
1
Jh (θ1 , . . . , θm , . . . , θ N )
e 2 (i )
1
T m
1 wm
2
(i ) = . (13)
= (8) 1
+ · · · +
1
+ · · · +
1
2 1 1 1 2 (i )
i=1 + ··· + 2 + ··· + 2 e12 (i ) em e2N (i )
e1 (i )
2 em (i ) e N (i )
where T denotes the size of observations, and em (i ) is the From this perspective, the geometrical mean squares (7) and
identification error of the mth model the harmonic mean squares (8) become the usual weighted
least squares (10) and (12), respectively. It is this change of
em (i ) = y(i + 1) − ϕ T (i )θm . (9) perspective that enables the algorithms for the identification
If any of the parameter estimates θ1 , . . . , θ N equals the true of PWARX systems to be derived. The details of the learning
value θm∗ at every time instant, where m can be any number in algorithms are deferred to the subsequent subsections.
the set {1, . . . , N} according to the location of the regressor
in the regressor domain, then at least one of the identification C. Least Geometrical Mean Squares
errors will be zero. This will render the cost functions (7) and
(8) zero. To identify the parameters of the N subsystems, we
Returning to nonlinear system [i.e., (3) with approximation propose to minimize the cost function (7) iteratively.
sign “≈”], the cost functions (7) and (8) would not have zero Assume that the parameter vectors at the kth iteration are
as the global minimum, but if we can minimize them toward θ1,k , . . . , θm,k , . . . , θ N,k , then the cost function is
the global minimum, we will be able to obtain sufficiently
Jg (k) = Jg (θ1,k , . . . , θm,k , . . . , θ N,k )
accurate parameters for the PWARX model.
1 2
T

B. New Perspective on the Cost Functions = e1,k (i ) . . . em,k


2
(i ) . . . e2N,k (i ) (14)
2
i=1
Having defined the above cost functions, the next step is
to minimize the cost function, from which the parameters of where em,k (i ) is the identification error of the mth models at
the subsystems will be recovered. Since the cost functions are the kth step
no longer quadratic function of the parameters, there is no
em,k (t) = y(t + 1) − ϕ T (t)θm,k . (15)
direct way to find out the global minimum solution using the
conventional optimization algorithms such as gradient descent Following the discussion in Section III-B, (14) can be
and conjugate gradient methods. transformed into a weighted-least-squares cost function. We
In the following, a new perspective on the cost functions, propose one simple algorithm such that the value of the cost
which would enable well-known algorithms to be used for function is guaranteed to decrease at each iteration. The N
identifying the parameters of the subsystems, is proposed. parameter sets can be estimated separately in N steps in each
This new perspective is inspired by the well-known EM iteration as follows.
algorithm [29], which deals with the incomplete data problem First, the parameter θ1,k+1 is computed by minimizing
by solving the optimization problem recursively as if the (14) assuming that the parameters of all the other models,
unknown parameters were the true values at each iteration. θ2,k , . . . , θ N,k , are fixed. Then the cost function can be rewrit-
It is observed that if all of the parameters in (7), except the ten as
one that is to be estimated at a particular moment, are fixed,
1
T
then the cost function becomes quadratic with respect to the
parameter Jg (θ1 ) = (y(t + 1) − ϕ T (t)θ1 )2 e2,k
2
(t) . . . e2N,k (t)
2
t =1
1 2
T
1  T
Jg (θ1 , . . . , θm , . . . , θ N ) = e (i ) . . . e2 (i ) . . . e2 (i ) = (y(t + 1) − ϕ T (t)θ1 )2 w1,k
2
(t)
2 1
m
N 2
i=1
fixed fixed t =1

1 1
T T
= wm
2
(i )em
2
(i ) (10) = (w1,k (t)y(t + 1) − w1,k (t)ϕ T (t)θ1 )2 (16)
2 2
i=1 t =1
2192 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011

where are fixed. Recall that it is better to use the latest update of

N
θ1,k+1 . . . θm−1,k+1 rather than the previous one
w1,k
2
(t) = e2j,k (t). (17)
j =2 T
θm,k+1 = (m,k m,k )−1 m,k
T
Ym,k (29)
By assuming that the data are sufficiently rich, the unique
where
optimal solution can be obtained as
θ̂1∗ = (1,k
T
1,k )−1 1,k
T
Y1,k (18) Ym,k = (wm,k (1)y(2), wm,k (2)y(3), . . . , wm,k (T )
·y(T + 1))T (30)
where ⎛ ⎞
wm,k (1)ϕ (1)
T

Y1,k = (w1,k (1)y(2), w1,k (2)y(3), . . . , w1,k (T )y(T + 1))T ⎜ .. ⎟


m,k = ⎝ . ⎠ (31)

⎛ ⎞
(19) wm,k (T )ϕ T (T )
⎛ ⎞⎛ ⎞
w1,k (1)ϕ T (1)
⎜ ⎟
m−1
N
1,k = ⎝ .. wm,k
2
(t) = ⎝ e2j,k+1 (t)⎠ ⎝ e2j,k (t)⎠.
. ⎠. (20) (32)
w1,k (T )ϕ (T )
T j =1 j =m+1

Thus, let Since, at each step, the resulting parameter corresponds to


the unique global minimum point, we have
T
θ1,k+1 = (1,k 1,k )−1 1,k
T
Y1,k . (21)
Jg (θ1,k+1 , θ2,k , . . . , θ N,k )
The second step is to update the parameters of the second
≤ Jg (θ1,k , θ2,k , . . . , θ N,k ) (33)
model assuming that the parameters of all the other models are
fixed. Note that it is better to use the latest update of θ1,k+1 Jg (θ1,k+1 , θ2,k+1 , . . . , θ N,k )
rather than the previous one θ1,k . For this purpose, the identi- ≤ Jg (θ1,k+1 , θ2,k , . . . , θ N,k ) (34)
fication error of the first model needs to be recalculated as ..
.
e1,k+1 (t) = y(t + 1) − ϕ T (t)θ1,k+1 . (22) Jg (θ1,k+1 , θ2,k+1 , . . . , θm,k+1 , . . . θ N,k )
≤ Jg (θ1,k+1 , θ2,k+1 , . . . , θm,k , . . . θ N,k ). (35)
Then the corresponding cost function is ..
.
Jg (θ2 )
Therefore, the cost function at each iteration is guaranteed
1
T
= (y(t + 1) − ϕ T (t)θ2 )2 e1,k+1
2
(t)e3,k
2
(t) . . . e2N,k (t) to decrease monotonically, such that
2
t =1
Jg (θ1,k+1 , θ2,k+1 , . . . , θ N,k+1 ) ≤ Jg (θ1,k , θ2,k , . . . , θ N,k ). (36)
1  T
= (y(t + 1) − ϕ T (t)θ2 )2 w2,k
2
(t) Remark 1: From the above inequality, it can be concluded
2
t =1 that the cost function is guaranteed to converge. However, a
1  T
mathematical proof that shows it converges to zero is still
= (w2,k (t)y(t + 1) − w2,k (t)ϕ T (t)θ2 )2 (23) lacking at this moment. Nevertheless, extensive simulation
2
t =1 studies indicate that the cost function indeed converges to
where zero when the data is rich and when there is no measurement
2 2 2
w2,k = e1,k+1 (t)e3,k (t) . . . e2N,k (t). (24) noise. A complete convergence analysis will thus be part of
our future work.
The optimal solution to this problem is then
θ̂2∗ = (2,k
T
2,k )−1 2,k
T
Y2,k (25) D. Least Harmonic Mean Squares (LHM)
where An algorithm to minimize the harmonic mean squares (8)
can be derived similarly. Due to space limitation, only a brief
Y2,k = (w2,k (1)y(2), w2,k (2)y(3), . . . , w2,k (T )y(T + 1)) T
outline of the algorithm will be given.
(26) Assume the parameter vectors at the kth iteration are
⎛ ⎞
w2,k (1)ϕ T (1) θ1,k , . . . , θm,k , . . . , θ N,k , then the cost function is
⎜ .. ⎟
2,k =⎝ . ⎠. (27)
Jh (k) = Jh (θ1,k , . . . , θm,k , . . . , θ N,k )
w2,k (T )ϕ T (T )
1
T
1
Thus, let = . (37)
2 1 1 1
i=1 + · · · + + · · · +
T
θ2,k+1 = (2,k 2,k )−1 2,k
T
Y2,k . (28) 2 (i )
e1,k 2 (i )
em,k e2N,k (i )

Similarly, the mth step is to update the parameter of the As in Section III-C, at the mth step, it is better to use the lat-
mth model assuming the parameters of all the other models est update of θ1,k+1 . . . θm−1,k+1 rather than the previous one.
LAI et al.: DATA-BASED IDENTIFICATION AND CONTROL OF NONLINEAR SYSTEMS 2193

y(t) y(t)
Thus fix
1 Data
σ(t)
Data
σ(t)
y(t−na) y(t−na)
2 (i ) classifier I classifier II
em,k u(t) u(t−1) ∼
wm,k
2
(i ) = ⎡ ⎤. (38) h(x(t)) h(x(t))
1
+ · · · + e2 1 (i)
⎣ e1,k+11(i)
2
m−1,k+1 ⎦ u(t−nb) u(t−nb)
+ 2 + ···+ 2 1 (a) (b)
em,k (i) e N,k (i)

Then, the estimate for θm∗ can be updated using (29), (30), Fig. 1. Data classifier for estimation of partition of regressor space.
and (31). (a) Classifier I. (b) Classifier II.

E. Discussion on the Number of subsystems


u(t), . . . , u(t − n b )]T . x(t) can therefore be directly used as
In the previous sections, the number of the models, i.e., N, the input of the data classifier. Hereafter, the classifier in this
is assumed known. However, if it were unknown a priori, how case will be called “Classifier I” [see Fig. 1(a)].
do we choose a suitable N?
One possible guidance to gauge the suitability of a particular
B. Modified Regressor Space—Classifier II
N is the fitting accuracy of the models on both training and
test data. For the training data, it is intuitive that the larger the However, for reasons that will be explained later, includ-
N, the more accurate will be the data-fitting. However, this ing u(t) within the regression vector x(t) in Classifier I
might not necessarily be reflected on the test set because of will prohibit an efficient use of the predictive control. We
overfitting. Therefore, we could try out a few different N’s, therefore propose to use the modified regressor x̃(t) =
and then choose the one that gives the best balance between [y(t), . . . , y(t − n a ), u(t − 1), . . . , u(t − n b )]T as the input to
the wellness of fit of the training data and that of the test data. the data classifier. This classifier is named “Classifier II” [see
Another issue to be considered is the increase of computa- Fig. 1(b)].
tional burden as N gets larger. However, we would like to point As will be seen in the simulation and experimental studies
out that the above identification algorithms are essentially later, for certain nonlinear systems, this modification is accept-
offline methods. As such, some computational burden can able for the identification purpose and would not degrade
be tolerated without too much concern on the computational the accuracy of the PWA model severely. It is, however,
time. As for the calculation of control signal (see Section V), noteworthy that the performance of predictive control can be
we will propose a new method to avoid heavy computation, improved significantly by using this modified regressor for
which can be implemented in real time even if N is large. data classification.
Therefore, it is more important to choose N based on the Remark 2: To avoid confusion, we would like to stress here
previous criteria (fitting of both training and test data) instead that the modified regression vector applies only for the data
of the computational load. classification step, i.e., we omit u(t) only when classifying
the data points. The full regressor is still being used for the
IV. S ECOND S TEP OF I DENTIFICATION P ROCEDURE : parameter identification part. That means, (3) is rewritten as
E STIMATION OF THE PARTITION OF R EGRESSOR S PACE ⎧ T

⎪ ϕ (t)θ1∗ if x̃(t) ∈ X̃ 1
After the parameters of the individual subsystems have ⎪
⎪ .. ..


been identified, the next step is to estimate the partitioned ⎨. .
regions X 1 , . . . , X N . This will enable the current active y(t + 1) ≈ ϕ T (t)θm∗ if x̃(t) ∈ X̃ m∈(1,...,N) (39)

⎪ . .

subsystem to be recognized correctly via the regression vector ⎪ ..


..
x(t) and allow the next output to be predicted based on the ⎩ T ∗
ϕ (t)θ N if x̃(t) ∈ X̃ N
active subsystem.
This can be formulated as a standard pattern classifi- where
cation problem, and the steps are briefly detailed as fol-
x(t) = [y(t), . . . , y(t − n a ), u(t), . . . , u(t − n b )]T (40)
lows. The (training) data are first labeled based on the  T
minimum prediction error from the multiple models, i.e., ϕ(t) = x T (t), 1 (41)
class (i th data point) = arg min1≤ j ≤N (|e j (i )|), and grouped
x̃(t) = [y(t), . . . , y(t − n a ), u(t − 1), . . . , u(t − n b )] T
(42)
into different classes. Standard algorithms such as multiclass
support vector machine classifiers and neural network classi- and
fiers are then trained (with inputs = x(t) and output = class ∪i=1
N
X̃ i = X̃ and X̃ i ∩ X̃ j = ∅, ∀i
= j. (43)
label) to classify the data and to estimate the boundaries of the
partitions. As a notation, we will write σ (t) = h(x(t)), where
h is the function of the data classifier, and σ (t) ∈ {1, . . . , N} C. Points on Boundary of Partitions and Measurement Noise
denotes the active subsystem. In estimating the partition of regressor space, two questions
need to be addressed. First, what should be done for points
A. Standard Regressor Space—Classifier I lying exactly on the boundary of two partitions? In the
According to the standard definition of the PWARX system authors’ view, this does not create too many difficulties for the
(3), the regression vector is x(t) = [y(t), . . . , y(t − n a ), following reasons. In the identification algorithm, the points
2194 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011

are grouped by the classifier (e.g., support vector machine or Note that the computation of the control signal u(t) accord-
neural network classifier), and the boundary corresponds to ing to (45) or (46) depends on the switching signal σ (t), which
those points whose class cannot be decided, i.e., when the is the output of the data classifier. The accomplishment of this
outputs of the data classifiers are exactly identical. But the task can be discussed based on our two different methods of
probability for this to happen is zero. Even if the output of data classification.
the data classifier happens to be exactly the same, we still can
force the classifier to choose one of them. As such, we do not A. Method I
need to worry about the points on the boundary.
Second, how do we deal with the problem of measurement The first thought would be to use Classifier I to decide the
noise, which could make it difficult for the data classifier to most current active subsystem σ (t). However, we will run into
estimate the partition? For this issue, note that we can use a difficult causality problem or “chicken and egg situation”
some soft margin when training the data classifier [30], [31], here: since σ (t) = h(x(t)), at time t, we need the regressor
and hence reduce the impact of measurement noise on data x(t) = [y(t), . . . , y(t − n a ), u(t), . . . , u(t − n b )]T to decide
classification. the current switching signal σ (t), but u(t) within x(t) is yet
to be designed at this stage!
V. C ONTROLLER D ESIGN A simple solution to this problem is to assume that
the switching is not so frequent, such that σ (t) =
After obtaining the PWARX model of the nonlinear system
σ (t − 1), i.e., we will use x(t − 1) = [y(t − 1), . . . ,
using the algorithm proposed in previous sections, we can
y(t − n a − 1), u(t − 1), . . . , u(t − n b − 1)]T to determine
proceed to derive a control law for the system. One simple
the active subsystem at time t. This approach would naturally
control design that would showcase the accuracy of the model
perform poorly if the switching is frequent and when the
is the one-step-ahead predictive controller.
nonlinearity of the system is severe.
Note that the PWARX model (3) can be unified in the
following form:
B. Method II
ŷ(t + 1) = ϕ T (t)θσ (t )
A better approach to overcome the “chicken-and-egg” prob-
= a1,σ (t ) y(t) + · · · + ana ,σ (t ) y(t − n a ) lem, if we were to use Classifier I, would be as follows. We
+b1,σ (t )u(t) + · · · + bnb ,σ (t )u(t − n b ) + cσ (t ) first design u 1 (t) by minimizing the following constrained cost
(44) function:
⎛ ⎞2
where σ (t) is the switching signal according to the active r (t + 1)
region X 1 , . . . , X N (or X̃ 1 , . . . , X̃ N ), and ŷ(t + 1) is the ⎜ −a11 y(t) − · · · − a1na y(t − n a ) ⎟
⎜ ⎟
predicted output. minu 1 (t ) J = ⎜ −b
⎜ 11 1 u (t) ⎟

The one-step-ahead control law which would ensure that ⎝ −b12u(t − 1) − · · · − b1nb u(t − n b ) ⎠
y(t + 1) tracks the reference r (t + 1) is −c1
⎡ ⎤
r (t + 1) +λu 21 (t)
⎢ −a1,σ (t ) y(t) − · · · − an ,σ (t ) y(t − n a ) ⎥  
⎢ ⎥ y(t), . . . , y(t − n a ),
a
⎣ −b2,σ (t )u(t − 1) − · · · − bnb ,σ (t ) u(t − n b ) ⎦ s.t. σ (t) = h =1 (48)
u 1 (t), u(t − 1), . . . , u(t − n b )
−cσ (t )
u(t) = (45) i.e., the so-computed u 1 (t) is constrained such that the first
b1,σ (t ) subsystem is active. Note that there could or could not be a
which is obtained by minimizing (r (t + 1) − ŷ(t + 1))2 . feasible solution for this optimization problem.
To improve the stability of the one-step-ahead controller, Next, we design u 2 (t) by minimizing the following con-
we can also use the weighted one-step-ahead control law as strained cost function:
follows: ⎛ ⎞2
⎡ ⎤ r (t + 1)
r (t + 1) ⎜ −a21 y(t) − · · · − a2na y(t − n a ) ⎟
⎢ −a1,σ (t ) y(t) − · · · − an ,σ (t ) y(t − n a ) ⎥ ⎜ ⎟
⎜ −b
minu 2 (t ) J = ⎜ 21 2 (t) ⎟
b1,σ (t ) ⎢ a
⎣ −b2,σ (t )u(t − 1) − · · · − bnb ,σ (t )u(t − n b ) ⎦
⎥ u ⎟
⎝ −b22u(t − 1) − · · · − b2nb u(t − n b ) ⎠
−cσ (t ) −c2
u(t) =
(t ) + λ
2
b1,σ +λu 22 (t)
(46)  
y(t), . . . , y(t − n a ),
where λ is a nonnegative number. This can be obtained by s.t. σ (t) = h =2 (49)
u 2 (t), u(t − 1), . . . , u(t − n b )
minimizing the following cost function:
⎛ ⎞2 i.e., the so-computed u 2 (t) is constrained such that the second
r (t + 1) subsystem is active. Again, note that there could or could not
⎜ −a1,σ (t ) y(t) − · · · − an ,σ (t ) y(t − n a ) ⎟
minu(t ) J = ⎜ a ⎟
⎝ −b1,σ (t )u(t) − · · · − bnb ,σ (t ) u(t − n b ) ⎠
be a feasible solution for this optimization problem.
Similarly, the same procedure is repeated for all of the N
−cσ (t ) models. Now, for subsystems whose optimization problem is
+λu 2 (t). (47) infeasible, we can immediately discard the result. On the other
LAI et al.: DATA-BASED IDENTIFICATION AND CONTROL OF NONLINEAR SYSTEMS 2195

Fit using 1 affine model Fit using PWARX model with 2 subsystems
3 3 applicable. These issues will be investigated as part of our
2 2 future work.
1 1
0 0
−1 −1
−2 −2
VI. S IMULATION S TUDIES
−3 −3
−4 −4
Simulation studies are carried out to demonstrate the effi-
−5 −5 cacy of our proposed algorithm in obtaining PWARX models
0 50 100 150 200 0 50 100 150 200
Fit using PWARX model with 4 subsystems Fit using PWARX model with 6 subsystems of nonlinear systems, as well as the performance of the
3 3
2 2
control law.
1 1 Consider the nonlinear system
0 0  
−1 −1
ξ1 (t)
−2 −2 ξ1 (t + 1) = + 1 sin(ξ2 (t))
−3 −3 1 + ξ12 (t)
−4 −4
ξ2 (t + 1) = ξ2 (t)cos(ξ2 (t)) + ξ1 (t)e−(ξ1 (t )+ξ2 (t ))/8
−5 −5 2 2
0 50 100 150 200 0 50 100 150 200

u 3 (t)
Fig. 2. Identification of the nonlinear system via PWARX model using +
Classifier II. Solid: true output. Dashed: estimated output. 1 + u 2 (t) + 0.5cos(ξ1 (t) + ξ2 (t))
ξ1 (t) ξ2 (t)
y(t) = + . (50)
1 + 0.5sin(ξ2 (t)) 1 + 0.5sin(ξ1 (t))
hand, for those that are feasible, the natural step would be to
compare the cost functions, and choose the control signal that This plant, taken from [34], does not represent any real
corresponds to the minimum cost value. system but is sufficiently complex and nonlinear so that
This approach resembles the method described by [13] and the conventional linear methods will not provide satisfactory
[18]. The biggest problem with this approach is the heavy com- performance.
putational burden, because a nonlinear programming needs to 1) Identification: The training set consists of 4000 data,
be solved—nonlinear because the constraints on the active with the inputs exciting the system being random in the range
subsystems need to be accounted for with the help of the data of [−2.5, 2.5].
classifier, i.e., either the nonlinear multilayer perceptron, or The PWARX model is of the form
⎧ T ∗
⎨ ϕ (t)θ1 if x(t) ∈ X 1 or x̃(t) ∈ X̃ 1
the support vector machine with possibly a nonlinear kernel. ⎪
This computational effort hinders the use of this approach on
y(t + 1) ≈ .. . .. (51)
real-time systems. ⎪ .
⎩ T ∗
ϕ (t)θ N if x(t) ∈ X N or x̃(t) ∈ X̃ N
C. Method III where
The above “chicken and egg situation” motivated us to
x(t) = [y(t), y(t − 1), y(t − 2), u(t), u(t − 1), u(t − 2)]T
propose Classifier II for data classification purpose. Based  T
on the modified regressor x̃(t) = [y(t), . . . , y(t − n a ), ϕ(t) = x(t)T , 1
u(t − 1), . . . , u(t − n b )]T , u(t) is not involved in the decision
making about the active subsystem, and thus there is no need x̃(t) = [y(t), y(t − 1), y(t − 2), u(t − 1), u(t − 2)]T. (52)
to solve the constrained optimization problem as in Method II. After identifying the parameters θ1∗ to θ N∗ using either the
Furthermore, the most current output data y(t) is employed for least geometrical mean algorithm or the least harmonic mean
deciding the most current switching signal σ (t), contrary to algorithm, we proceed to train a neural network for data
Method I. This would naturally improve the performance of classification, using either the standard regressor x(t) or the
the predictive controller. modified regressor x̃(t).
To verify the accuracy of the PWARX model in approximat-
D. Remark on the Stability of the System ing the original nonlinear system, the output of the nonlinear
It is clear that stability is an important issue when designing system is compared with the output of the PWARX model
control systems. However, the stability analysis for the control when the test input is described by
of nonlinear systems using multiple linear controllers is a    
2πt 2πt
very challenging problem. For instance, it is well known u(t) = sin + sin . (53)
10 25
that, even though each of the local linear controllers can
stabilize its corresponding approximately linear region of the The simulation result for the test data using Classifier II
system, stability can still be lost if there is frequent switching to choose the active subsystem is shown in Fig. 2. As can
among the subsystems [32], [33]. Also, although there are be observed, the nonlinear system can be approximated well
some stability results on model predictive controller for hybrid using the PWARX model with four and six subsystems. The
systems, they all assume that the model is accurate. In our data-fitting by using Classifier I is just slightly better than by
case, since the PWARX model is an approximate model of the Classifier II, and the result is not shown here due to space
original nonlinear system, the stability results are not directly limitations.
2196 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011

TABLE I Control using 1 affine model Control using PWARX model with 2 subsystems
F IT VALUES 3 3
2 2
1 1
No. subsystems 1 2 4 6
0 0
Classifier I 16.5% 50.8% 68.4% 70.5% −1 −1
Classifier II 16.5% 48.7% 62.2% 68.5% −2 −2
−3 −3
0 50 100 150 200 0 50 100 150 200
Control using 1 affine model Control using PWARX model with 2 subsystems Control using PWARX model with 4 subsystems Control using PWARX model with 6 subsystems
3 3 3 3
2 2 2 2
1 1 1 1
0 0 0 0
−1 −1 −1 −1
−2 −2 −2 −2
−3 −3 −3 −3
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Control using PWARX model with 4 subsystems Control using PWARX model with 6 subsystems
5 Fig. 4. Control of the nonlinear system—Method III. Solid: Reference.
2
0 0 Dashed: Output of nonlinear system.
−2
−5
−4
−6 −10
−8
−10 −15
0 50 100 150 200 0 50 100 150 200

Fig. 3. Control of the nonlinear system—Method I. Solid: reference. Dashed.


Output of nonlinear system.

The quantitative accuracy of the PWARX models are com-


puted using the “Fit” value which is defined as
  
t (y(t) − ŷ(t))
2
Fit = 1 −  × 100% (54)
t (y(t) − ȳ)
2

for both cases where either Classifier I or II is used to decide Fig. 5. Hardware setup of the single-link robotic arm.
the active subsystem, with y(t), ŷ(t), and ȳ being the true
output, estimated output, and the average of y(t), respectively.
The result, given in Table I, shows that the model accuracy is the 200 data points on a computer with Intel Core2 Duo CPU
comparable for both types of classifier. (2.53 GHz) and 2.95 GB RAM at 1.59 GHz.
2) Control Using Method I: With an accurate PWARX 4) Control Using Method III: The simulation result for the
model of the nonlinear system, we proceed to control the control performance by using Method III is shown in Fig. 4.
system using the designed control law. The nonlinear system As can be observed, the control performance is satisfactory,
is required to track the reference signal even though Classifier II uses less information to estimate
    the partition of the regressor space. Most importantly, the
2πt 2πt
r (t) = sin + sin . (55) simulation time was very short, and was only 2 s for the case
20 100 with six subsystems.
The simulation result for the control of the nonlinear system Remark 3: A few other types of models and algorithms
using Method I is shown in Fig. 3. The performance of the have been proposed in the literature to identify (50), e.g., [35],
controller is very poor. This can be attributed to the fact that [36], and their identification results could indeed be superior
the assumption σ (t) = σ (t −1) does not hold well. The wrong to ours. However, the model structures in [35], [36] cannot
subsystem has been assumed at time t, and thus the control be easily used for the control system design. In contrast, our
signal is computed incorrectly. method can not only provide sufficiently good identification
3) Control Using Method II: We then proceed to use accuracy but also facilitate the design of controllers for non-
Method II to control the nonlinear system. The control perfor- linear systems, which is the ultimate goal of this framework.
mance improved significantly compared to Method I. However,
due to page limitation and similarity with the result of the
VII. E XPERIMENTAL S TUDIES
following Method III, we will not show the figures here.
Despite the good performance, this method has a large A d.c. motor apparatus by L. J. Electronics, which is used
computational burden and is not implementable in real time. as a teaching set for teaching control courses, has been slightly
For instance, for the case of using the PWARX model with modified to be a single-link robotic arm. Specifically, the
four subsystems, the simulation time was more than 3 min for original load, which is centric, has been replaced by a brass rod
LAI et al.: DATA-BASED IDENTIFICATION AND CONTROL OF NONLINEAR SYSTEMS 2197

1 linear model PWARX model with 2 subsystems


0.2
0.15

Identification error/rad
DC Motor 0.1
0.05
0
−0.05
g
−0.1
l
α −0.15
−0.2
0 1000 2000 3000 1000 2000 3000 4000
Data point Data point
PWARX model with 4 subsystems PWARX model with 8 subsystems
0.2
M

Identification error/rad
0.15
0.1
0.05
0
−0.05
−0.1
Fig. 6. Schematic diagram of the single-link robotic arm. −0.15
−0.2
0 1000 2000 3000 1000 2000 3000 4000
Data point Data point
DC Motor
r(t) u(t) dSPACE v(t) s(t) dSPACE α(t)
Controller with Single-
DAC Link Arm ADC Fig. 8. Identification error for the training set.

form
Fig. 7. Hardware-in-the-loop simulation for the single-link robotic arm. ⎧ T ∗
⎨ ϕ (t)θ1
⎪ if x̃(t) ∈ X̃ 1
α(t + 1) ≈ ... ..
. (57)

⎩ T
with a heavy brass pendulum at the end. The setup is shown ϕ (t)θ N∗ if x̃(t) ∈ X̃ N
in Fig. 5, and its schematic diagram is shown in Fig. 6.
The simplified physical model of the system is where

J α̈ = −β α̇ − Mglsin(α) + K u (56) x(t) = [α(t), α(t − 1), u(t), u(t − 1)]T (58)
 T
where J is the overall moment of inertia of the system ϕ(t) = x(t)T , 1 (59)
including the motor shaft, the rod, and the pendulum, M is x̃(t) = [α(t), α(t − 1), u(t − 1)] . T
(60)
the mass of the rod and the pendulum, l is the distance from
the pivot of rotation to the center of gravity of the rod and The parameters of the two subsystems are identified using
pedulum, g is the gravitational constant, β is the damping the proposed LHM algorithm, whereas the regions X̃ i are
coefficient, α is the angle of rotation, and u is the input voltage. estimated using multilayer perceptrons. Note that we only
Determining J , β, l, and K is not very straightforward, and use Classifier II in this experiment, because we will only
thus it would be advantageous to use a data-based approach use Method III for control later, due to the weaknesses of
to identify and control the system. Methods I and II.
1) Training Set: The training set, which consists of 4000
data points, was obtained by setting the reference signal to be
A. PWARX Model of the Robotic Arm
Because the system has an integrator, and since we do r (t) = 0.6 sin(2π0.01t) + 0.5 sin(2π0.1t)
not want the robotic arm to rotate endlessly (or the angle to π!
+0.3 sin 2π0.4t + + π + (t) (61)
integrate up to a huge number), we first stabilize the system 3
using a PD-type controller. Note that this PD-type controller where (t) is a random number in the range of [−0.4, 0.4].
does not need to be tuned painstakingly to achieve excellent This randomness is introduced so that the dataset is rich in
tracking performance. It merely serves to stabilize the system frequency content.
and to allow the acquiring of input–output data. The identification errors for the training set are shown in
The PD controller is programmed in MATLAB/Simulink, Fig. 8. As can be seen, the error decreases as we have more
and a dSPACE DS1104 rapid control prototyping system is subsystems in the PWARX model.
used to generate the physical control signal v(t) to the d.c. 2) Test Set: The test set, which contains 1000 data points,
motor, based on the calculated numerical value of the control was obtained by setting the reference signal to be
signal u(t). Also, the sensor reading of the angle s(t) is sent
into the computer via the dSPACE system and interpreted r (t) = 0.7sin(2π0.02t) + 0.8sin(2π0.2t) + π. (62)
as numerical values α(t). The working diagram is shown
in Fig. 7. The identification errors for the test set are shown in Fig. 9.
For this experiment, we chose the sampling rate to be As the number of subsystems in the PWARX model increases,
10 Hz. The input–output PWARX model of the system is then the identification error decreases. This shows that the PWARX
identified based on the signals u(t) and α(t), which is of the models can generalize well to the test data.
2198 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011

0.2 1 linear model PWARX model with 2 subsystems Control using 1 linear model Control using PWARX model with 2 subsystems
Identification error/rad 0.15 0.2
0.1

Tracking error/rad
0.05 0.1
0
0
−0.05
−0.1 −0.1
−0.15
−0.2 −0.2
0 200 400 600 800 200 400 600 800 1000
0 500 1000 1500 500 1000 1500 2000
Data point Data point
Time step Time step
PWARX model with 4 subsystems PWARX model with 8 subsystems
0.2 Control using PWARX model with 4 subsystems Control using PWARX model with 8 subsystems
Identification error/rad

0.15
0.2
0.1

Tracking error/rad
0.05 0.1
0
−0.05 0
−0.1
−0.1
−0.15
−0.2 −0.2
0 200 400 600 800 200 400 600 800 1000
Data point Data point 0 500 1000 1500 500 1000 1500 2000
Time step Time step
Fig. 9. Identification error for the test set.
Fig. 11. Tracking error of the single-link robotic arm, reference signal 2.
Control using 1 linear model Control using PWARX model with 2 subsystems

0.2 TABLE II
Tracking error/rad

0.1 VARIANCE OF T RACKING E RROR


0
No. subsystems 1 2 4 8 PID
−0.1
Reference signal 1 0.0155 0.0074 0.0040 0.0032 0.0148
−0.2
Reference signal 2 0.0112 0.0059 0.0031 0.0026 0.0152
0 500 1000 1500 500 1000 1500 2000
Time step Time step

Control using PWARX model with 4 subsystems Control using PWARX model with 8 subsystems
Reference 1 Reference 2
0.2
0.3
Tracking error/rad

0.2
Tracking error/rad

0.1
0.1
0
0
−0.1
−0.1
−0.2 −0.2
0 500 1000 1500 500 1000 1500 2000
Time step Time step 0 500 1000 1500 2000 500 1000 1500 2000

Fig. 10. Tracking error of the single-link robotic arm, reference signal 1. Fig. 12. Tracking error of the single-link robotic arm using PID control.

B. Control of the Single-Link Arm VIII. C ONCLUSION


With an accurate model at hand, we now proceed to control
the robotic arm for (angular) position tracking. Recall that PWA models represent an attractive model structure for
we directly used classifier II to estimate the partition of the approximating nonlinear systems. They are also very useful for
regressor space, and therefore we apply Method III for the the controller design of nonlinear systems, wherein linear con-
control. Here, we defined two reference signals trollers can be first designed for the locally affine subsystems,
and then the controllers would switch from one to another
r1 (t) = 1.5sin(2π0.05t) + π (63) according to the operating region of the nonlinear systems.
π! However, identifying PWA models of a nonlinear system,
r2 (t) = sin(2π0.03t) + 0.8sin 2π0.1t + + π. (64)
3 given the input–output data, is very challenging because the
The experimental results are shown in Figs. 10 and 11, identification problem is coupled with a data classification
respectively. It can be observed that the tracking errors problem, i.e., each data point needs to be associated with the
decrease as the number of subsystems increases. This improve- most suitable submodel.
ment is also numerically verified as shown through the vari- In this paper, a procedure for obtaining the PWARX models
ance of errors in Table II. of nonlinear systems was proposed. Two major parameters
A proportional integral differential-type controller is also characterizing a PWARX model, namely the parameters of
designed for comparison purposes. The parameters of the locally affine subsystems and the partition of the regressor
controller are properly tuned to achieve a good tracking space, were estimated, the former through a least-squares-
performance. The experimental results are shown in Fig. 12. based identification method using multiple models, and the
It is clear that the tracking accuracy of using our proposed latter using standard procedures such as the neural network
weighted one-step-ahead controller with PWARX model is classifier or support vector machine classifier. A modified
higher than obtained by using the PID controller. regressor space was proposed for the data classifier, which
LAI et al.: DATA-BASED IDENTIFICATION AND CONTROL OF NONLINEAR SYSTEMS 2199

could circumvent a causality problem for the controller design [20] A. Alessio and A. Bemporad, “Feasible mode enumeration and cost
later. Having obtained the PWARX model of the nonlinear comparison for explicit quadratic model predictive control of hybrid
systems,” in Proc. IFAC Conf. Anal. Design Hybrid Syst., Alghero, Italy,
system, a controller was then derived to control the system 2006, pp. 302–308.
for reference tracking. Simulation and experimental studies [21] M. Morari and M. Baric, “Recent developments in the control of
showed that the proposed algorithm could indeed provide constrained hybrid systems,” Comput. Chem. Eng., vol. 30, nos. 10–12,
pp. 1619–1631, 2006.
accurate PWA approximation of nonlinear systems, and that [22] J. Thomas, S. Olaru, J. Duisson, and D. Dumur, “Robust model
the designed controller provided good tracking performance. predictive control for piecewise affine systems subject to bounded
disturbances,” in Proc. IFAC Conf. Anal. Design Hybrid Syst., 2006,
pp. 329–334.
ACKNOWLEDGMENT [23] N. N. Nandola and S. Bhartiya, “A multiple model approach for
predictive control of nonlinear hybrid systems,” J. Process Control,
The authors would like to thank the Guest Editor and vol. 18, no. 2, pp. 131–148, Feb. 2008.
the anonymous reviewers for their critical comments that [24] E. F. Camacho, D. R. Ramirez, D. Limon, D. M. de Pena, and T. Alama,
improved the quality of this paper. “Model predictive control techniques for hybrid systems,” Annu. Rev.
Control, vol. 34, no. 1, pp. 21–31, Apr. 2010.
[25] M. Kvasnica, P. Grieder, M. Baotic, and F. J. Christophersen.
R EFERENCES (2006). Multi-Parametric Toolbox [Online]. Available: http://control.ee.
ethz.ch/∼mpt/
[1] E. D. Sontag, “Nonlinear regulation: The piecewise linear approach,” [26] C. Xiang, C. Y. Lai, T. H. Lee, and K. S. Narendra, “A general
IEEE Trans. Autom. Control, vol. 26, no. 2, pp. 346–358, Apr. 1981. framework for least-squares based identification of time-varying system
[2] A. Juloski, W. P. M. H. Heemels, G. Ferrari-Trecate, R. Vidal, S. Paoletti, using multiple models,” in Proc. IEEE Int. Conf. Control Autom.,
and J. H. G. Niessen, “Comparison of four procedures for the identifica- Christchurch, New Zealand, Dec. 2009, pp. 212–219.
tion of hybrid systems,” in Hybrid Systems: Computation and Control, [27] C. Y. Lai, C. Xiang, and T. H. Lee, “Identification and control of
M. Morari and L. Thiele, Eds. New York: Springer-Verlag, 2005, pp. nonlinear systems using piecewise affine models,” in Proc. IEEE Conf.
354–369. Decis. Control, Atlanta, GA, Dec. 2010, pp. 6395–6402.
[3] S. Paoletti, A. Lj. Juloski, G. Ferrari-Tracate, and R. Vidal, “Identifica- [28] C. Xiang, “Existence of global input-output model for nonlinear sys-
tion of hybrid systems: A tutorial,” Eur. J. Control, vol. 513, nos. 2–3, tems,” in Proc. IEEE Int. Conf. Control Autom., vol. 1. Budapest,
pp. 242–260, 2007. Hungary, Jun. 2005, pp. 125–130.
[4] W. Heemels, B. De Schutter, and A. Bemporad, “Equivalence of hybrid [29] A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from
dynamical models,” Automatica, vol. 37, no. 7, pp. 1085–1091, 2001. incomplete data via the EM algorithm,” J. Royal Stat. Soc. B, vol. 39,
[5] B. De Schutter and T. J. J. Boom, “MPC for continuous piecewise-affine no. 1, pp. 1–37, 1977.
systems,” Syst. Control Lett., vol. 52, nos. 3–4, pp. 179–192, Jul. 2004. [30] C. Cortes and V. Vapnik, “Support vector networks,” Mach. Learn.,
[6] L. Rodrigues and J. P. How, “Automated control design for a piecewise- vol. 20, no. 3, pp. 273–297, 1995.
affine approximation of a class of nonlinear systems,” in Proc. Amer. [31] J. Shawe-Taylor and N. Christianini, “On the generalization of soft
Control Conf., vol. 4. Arlington, VA, 2001, pp. 3189–3194. margin algorithms,” IEEE Trans. Inf. Theory, vol. 48, no. 10, pp. 2721–
[7] G. Ferrari-Trecate, M. Muselli, D. Liberati, and M. Morari, “A clustering 2735, Oct. 2002.
technique for the identification of piecewise affine systems,” Automatica, [32] D. Liberzon, Switching in Systems and Control. Boston, MA: Birkhäuser,
vol. 39, no. 2, pp. 205–217, Feb. 2003. 2003.
[8] R. Vidal, S. Soatto, Y. Ma, and S. Sastry, “An algebraic geometric [33] H. Lin and P. J. Antsaklis, “Stability and stabilizability of switched
approach to the identification of a class of linear hybrid systems,” in linear systems: A survey of recent results,” IEEE Trans. Autom. Control,
Proc. IEEE Conf. Decis. Control, vol. 1. Dec. 2003, pp. 167–172. vol. 54, no. 2, pp. 308–322, Feb. 2009.
[9] J. Roll, A. Bemporad, and L. Ljung, “Identification of piecewise affine [34] K. S. Narendra and S. M. Li, “Neural networks in control systems,”
systems via mixed-integer programming,” Automatica, vol. 40, no. 1, in Mathematical Perspectives on Neural Networks, P. Smolensky, M.
pp. 37–50, 2004. C. Mozer, and D. E. Rumelhart, Eds. Mahwah, NJ: Lawrence Erlbaum
[10] A. Lj. Juloski, S. Weiland, and W. P. M. H. Heemels, “A Bayesian Associates, 1996.
approach to identification of hybrid systems,” IEEE Trans. Autom. [35] C. Wen, S. Wang, X. Jin, and X. Ma, “Identification of dynamic systems
Control, vol. 50, no. 10, pp. 1520–1533, Oct. 2005. using piecewise-affine basis function models,” Automatica, vol. 43,
[11] A. Bemporad, A. Garulli, S. Paoletti, and A. Vicino, “A bounded-error no. 10, pp. 1824–1831, 2007.
approach to piecewise affine system identification,” IEEE Trans. Autom. [36] J. Xu, X. Huang, and S. Wang, “Adaptive hinging hyperplanes and
Control, vol. 50, no. 10, pp. 1567–1580, Oct. 2005. its applications in dynamic system identification,” Automatica, vol. 45,
[12] H. Nakada, K. Takaba, and T. Katayama, “Identification of piecewise no. 10, pp. 2325–2332, Oct. 2009.
affine systems based on statistical clustering technique,” Automatica,
vol. 41, no. 5, pp. 905–913, 2005.
[13] A. Bemporad and M. Morari, “Control of systems integrating logic,
dynamics, and constraints,” Automatica, vol. 35, no. 3, pp. 407–427,
1999.
[14] A. Bemporad, F. Borrelli, and M. Morari, “Piecewise linear optimal
controllers for hybrid systems,” in Proc. Amer. Control Conf., Chicago,
IL, 2000, pp. 1190–1194.
[15] A. Bemporad, F. Borrelli, and M. Morari, “Optimal controllers for hybrid
systems: Stability and piecewise linear explicit form,” in Proc. 39th
IEEE Conf. Decis. Control, Sydney, NSW, Australia, Dec. 2000, pp.
1810–1815. Chow Yin Lai received the Diplom-Ingenieur
[16] L. Ozkan, M. V. Kothare, and C. Georgakis, “Model predictive control (Fachhochschule) degree in mechatronics and
of nonlinear systems using piecewise linear models,” Comput. Chem. microsystems technology from the University of
Eng., vol. 24, nos. 2–7, pp. 793–799, Jul. 2000. Heilbronn, Heilbronn, Germany, in 2006, and the
[17] B. Aufderheide and B. W. Bequette, “Extension of dynamic matrix Ph.D. degree in electrical and computer engineering
control to multiple models,” Comput. Chem. Eng., vol. 27, nos. 8–9, from the Graduate School for Integrative Sciences
pp. 1079–1096, Sep. 2003. and Engineering, National University of Singapore,
[18] F. Borrelli, M. Baotic, A. Bemporad, and M. Morari, “Dynamic pro- Singapore, in 2006 and 2011, respectively.
gramming for constrained optimal control of discrete-time linear hybrid He is currently a Research Scientist with the
systems,” Automatica, vol. 41, no. 10, pp. 1709–1721, 2005. Singapore Institute of Manufacturing Technology,
[19] P. Grieder, M. Kvasnica, M. Baotic, and M. Morari, “Stabilizing low Agency for Science, Technology and Research, Sin-
complexity feedback control of constrained piecewise affine systems,” gapore. His current research interests include identification and control of
Automatica, vol. 41, no. 10, pp. 1683–1694, 2005. nonlinear systems using multiple models.
2200 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011

Cheng Xiang (M’01) received the B.S. degree holds four patents (two of which are in the technology area of adaptive
in mechanical engineering from Fudan University, systems, and the other two are in the area of intelligent mechatronics). He
Shanghai, China, the M.S. degree in mechanical has published more than 300 international journal papers. His current research
engineering from the Institute of Mechanics, Chinese interests include the areas of adaptive systems, knowledge-based controls,
Academy of Sciences, Beijing China, and the Ph.D. intelligent mechatronics, and computational intelligence.
degree in electrical engineering from Yale Univer- Dr. Lee was a recipient of the Cambridge University Charles Baker Prize
sity, New Haven, CT, in 1991, 1994, and 2000, in Engineering, the Asian Control Conference (ASCC) (Melbourne) Best
respectively. Industrial Control Application Paper Prize in 2004, the IEEE International
He was a Financial Engineer with Fannie Mae, Conference on Mechatronics and Automation Best Paper in Automation Prize
Washington D.C., from 2000 to 2001. He has been in 2009, and the ASCC Best Application Paper Prize in 2009. He was
with the National University of Singapore, Singa- an Invited Panelist at the World Automation Congress (WAC), WAC2000
pore, since 2001. He is currently an Associate Professor with the Department Maui, an Invited Keynote Speaker for the IEEE International Symposium on
of Electrical and Computer Engineering, National University of Singapore. Intelligent Control, the IEEE International Symposium on Intelligent Control,
His current research interests include pattern recognition, intelligent controls, Houston, in 2003, an Invited Keynote Speaker for the Life System Modeling
and systems biology. and Simulation (LSMS) in 2007, Shanghai, China, an Invited Expert Panelist
for the IEEE Advanced Intelligent Mechatronics in 2009, an Invited Plenary
Speaker for the International Association of Science and Technology for
Development (IASTED) Rewriting Techniques and Applications, Beijing,
China, in 2009, an Invited Keynote Speaker for LSMS, Shanghai, China, in
Tong Heng Lee (M’88) received the B.A. degree 2010, an Invited Keynote Speaker for the IASTED Control and Applications,
(first class honors) in engineering tripos from Cam- Banff, AL, Canada, in 2010, an Invited Keynote Speaker for the IFToMM
bridge University, London, U.K., and the Ph.D. International Conference on Digital Manufacturing and Automation, Chang-
degree from Yale University, New Haven, CT, in sha, China, in 2010, and an Invited Keynote Speaker for the International
1980 and 1987, respectively. Conference on Unmanned Aircraft Systems, Denver, in 2011. He is currently
He is a Professor with the Department of Electri- an Associate Editor with the IEEE T RANSACTIONS IN S YSTEMS , M AN A ND
cal and Computer Engineering, National University C YBERNETICS , the IEEE T RANSACTIONS IN I NDUSTRIAL E LECTRONICS ,
of Singapore (NUS), Singapore, and also a Pro- Control Engineering Practice (an International Federation of Automatic
fessor with NUS Graduate School, NUS. He was Control (IFAC) Journal), and the International Journal of Systems Science
a past Vice-President in Research, NUS. He has (Taylor and Francis, London). He is the Deputy Editor-in-Chief of the IFAC
co-authored five research monographs (books), and Mechatronics Journal.

You might also like