Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/250167920

A Learning Algorithm for Piecewise Linear Regression

Article · January 2001


DOI: 10.1007/978-1-4471-0219-9_9

CITATIONS READS

9 1,069

4 authors, including:

Giancarlo ferrari trecate Marco Muselli


École Polytechnique Fédérale de Lausanne Italian National Research Council
177 PUBLICATIONS   7,286 CITATIONS    125 PUBLICATIONS   2,032 CITATIONS   

SEE PROFILE SEE PROFILE

Diego Liberati
Italian National Research Council
83 PUBLICATIONS   1,278 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ethics in science and technology View project

Identification of distribution grids View project

All content following this page was uploaded by Marco Muselli on 15 January 2015.

The user has requested enhancement of the downloaded file.


A Learning Algorithm for
Piecewise Linear Regression
Giancarlo Ferrari-Trecate 1 , Marco Muselli 2 ,
Diego Liberati 3 , Manfred Morari 1
1
Institute für Automatik, ETHZ - ETL
CH 8092 Zürich, Switzerland
2
Istituto per i Circuiti Elettronici - CNR
via De Marini, 6 - 16149 Genova, Italy
3
Ce.S.T.I.A. - CNR c/o Politecnico di Milano
Piazza Leonardo da Vinci, 32 - 20133 Milano, Italy

Abstract
A new learning algorithm for solving piecewise linear regression problems
is proposed. It is able to train a proper multilayer feedforward neural
network so as to reconstruct a target function assuming a different linear
behavior on each set of a polyhedral partition of the input domain.
The proposed method combine local estimation, clustering in weight
space, classification and regression in order to achieve the desired result.
A simulation on a benchmark problem shows the good properties of this
new learning algorithm.

1 Introduction
Real-world problems to be solved by artificial neural networks are normally
subdivided in two groups according to the range of values assumed by the out-
put. If it is Boolean or nominal, we speak of classification problems; otherwise,
when the output is coded by a continuous variable, we are facing with a regres-
sion problem. In most cases, the techniques employed to train a connectionist
model depend on the kind of problem we are dealing with.
However, applications can be found, which lie on the borderline between
classification and regression; these occur when the input space can be subdi-
vided into disjoint regions Xi characterized by different behaviors of the func-
tion f to be reconstructed. The target of the learning problem is consequently
twofold: by analyzing a set of samples of f , possibly affected by noise, it has
to generate both the collection of regions Xi and the behavior of the unknown
function f in each of them.
If the region Xi corresponding to each sample in the training set were known,
we could add the index i of the region as an output, thus obtaining a classifi-
cation problem which has the target of finding the effective form of each Xi .
On the other side, if the actual partition Xi were known, we could solve sev-
eral regression problems to find the behavior of the function f within each Xi .
Because of this mixed nature, classical techniques for neural network training
cannot be directly applied, but specific methods are necessary to deal with this
kind of problems.
Perhaps, the simplest situation one can think of is piecewise linear regres-
sion: in this case the regions Xi are polyhedra and the behavior of the function
f in each Xi can be modeled by a linear expression. Several authors have
treated this kind of problem [2, 3, 4, 8], providing algorithms for reaching the
desired result. Unfortunately, most of them are difficult to extend beyond two
dimensions [2], whereas others consider only local approximations [3, 4], thus
missing the effective extension of regions Xi .
In this contribution a new training algorithm for neural networks solving
piecewise linear regression problems is proposed. It combines clustering and
supervised learning to obtain the correct values for the weights of a proper
multilayer feedforward architecture.

2 The piecewise linear regression problem


Let X be a polyhedron in the n-dimensional space Rn and Xi , i = 1, . . . , s,
aS polyhedral partition of X, i.e. Xi ∩ Xj = ∅ for every i, j = 1, . . . , s and
s
i=1 Xi = X. The target of a Piecewise Linear Regression (PLR) problem is
to reconstruct an unknown function f : X → R having a linear behavior in
each region Xi
n
X
f (x) = zi = wi0 + wij xj
j=1

when only a training set S containing m samples (xk , yk ), k = 1, . . . , m, is


available. The output yk gives an evaluation of f (xk ) subject to noise, being
xk ∈ X; the region Xi to which xk belongs is not known in advance. Scalars
wi0 , wi1 , . . . , win , for i = 1, . . . , s, characterize univocally the function f and
their estimate is a target of the PLR problem; for notational purposes they will
be included in a vector wi .
Since regions Xi are polyhedral, they can be defined by a set of li linear
inequalities of the following kind:
n
X
aij0 + aijk xk ≤ 0 (1)
k=1

Scalar aijk , for j = 1, . . . , li and k = 0, 1, . . . , n, can be included in a matrix


Ai , whose estimate is still a target of the reconstruction process for every i =
1, . . . , s. Discontinuities may be present in the function f at the boundaries
between two regions Xi .
Following the general idea presented in [8], a neural network realizing a
piecewise linear function f of this kind can be modeled as in Fig. 1. It contains
a gate layer that verifies inequalities (1) and decides which of the terms zi must
be used as the output y of the whole network. Thus, the i-th unit in the gate
y
S O u tp u t L a y e r

A 1 A 2 A s G a te L a y e r

w z 1 w z 2 z s w
S S S
1 0 2 0 s0

w w w H id d e n L a y e r
w w w
s1
w w
2 1 2 n
1 1
w 1 2
1 n 2 2 s2 sn

In p u t L a y e r
x 1 x 2 x n

Figure 1: General neural network realizing a piecewise linear function.

layer has output equal to its input zi , if all the constraints (1) are satisfied for
j = 1, . . . , li , and equal to 0 in the opposite case. All the other units perform
a weighted sum of their inputs; the weights of the output neuron, having no
bias, are always set to 1.

3 The proposed learning algorithm


As previously noted, the solution of a PLR problem requires a technique that
combine classification and regression: the first has the aim of finding matrices
Ai to be inserted in the gate layer of the neural network (Fig. 1), whereas the
latter provides weight vectors wi for the input to hidden layer connections. A
method of this kind is reported in Fig. 2; it is composed of four steps, each of
which is devoted to a specific task.
The first of them (Step 1) has the aim of obtaining a first estimate of the
weight vectors wi by performing local linear regressions based on small subsets
of the whole training set S. In fact, points xk that are close to each other are
likely to belong to the same region Xi . Then, for each sample (xk , yk ), with
k = 1, . . . , m, we build a set Ck containing (xk , yk ) and the c − 1 distinct pairs
(x, y) ∈ S that score the lowest values of the distance kxk − xk.
The parameter c can be freely chosen, though the inequality c ≥ n must
be respected to perform the linear regression. It can be easily seen that some
sets Ck , called mixed, will contain input patterns belonging to different regions
Xi . They lead to wrong estimates for wi and consequently their number must
be kept minimum; this can be obtained by lowering the value of c. However,
ALGORITHM FOR PIECEWISE LINEAR REGRESSION

1. (Local regression) For every k = 1, . . . , m do


1a. Form the set Ck containing the pair (xk , yk ) and the samples
(x, y) ∈ S associated with the c − 1 nearest neighbors x to xk .
1b. Perform a linear regression to obtain the weight vector vk of a
linear unit fitting the samples in Ck .
2. (Clustering) Perform a clustering process in the space Rn+1 to
subdivide the set of weight vectors vk into s groups Vi .
3. (Classification) Build a new training set S 0 containing the m pairs
(xk , ik ), being Vik the cluster including vk . Train a multicategory
classification method to produce the matrices Ai for the regions Xi .
4. (Regression) For every i = 1, . . . , s perform a linear regression on the
samples (x, y) ∈ S with x ∈ Xi to obtain the weight vector wi for the
i-th unit in the hidden layer.

Figure 2: Proposed learning method for piecewise linear regression.

the quality of the estimate improves when the size c of the sets Ck increases; a
tradeoff must therefore be attained in selecting a reasonable value for c.
Denote with vk the weight vector of the linear unit produced through the
linear regression on the samples in Ck . If the generation of the samples in the
training set is not affected by noise, most of the vk coincide with the desired
weight vectors wi . Only mixed sets Ck yield spurious vectors vk , which can
be considered as outliers. Nevertheless, even in presence of noise, a clustering
algorithm (Step 2) can be used to determine the sets Vi of vectors vk associated
with the same wi . A proper version of the K-means algorithm [6] can be
adopted to this aim if the number s of regions is fixed beforehand; otherwise,
adaptive techniques, such as the Growing Neural Gas [7], can be employed to
find at the same time the value of s.
The sets Vi generated by the clustering process induce a classification on
the input patterns xk belonging to the training set S. As a matter of fact, if
vk ∈ Vi for a given i, the set Ck is fitted by the linear neuron with weight vector
wi and consequently xk is located into the region Xi . The effective extension
of this region can be determined by solving a linear multicategory classification
problem (Step 3), whose training set S 0 is built by adding as output to each
input pattern xk the index ik of the set Vik to which the corresponding vector
vk belongs.
To avoid the presence of multiply classified points or of unclassified pat-
terns in the input space, proper techniques [1] based on linear and quadratic
programming can be employed. In this way the s matrices Ai for the gate layer
are generated; they can include redundant rows that are not necessary in the
determination of the polyhedral regions Xi . These rows can be removed by
applying standard linear programming techniques.
16 16

14 14

12 12

10 10

8 8

y
y

6 6

4 4

2 2

0 0

−2 −2
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
x x

a) b)

Figure 3: Simulation results for a benchmark problem: a) unknown piecewise


linear function f and training set S, b) function realized by the trained neural
network (dashed line).

Finally, weight vectors wi for the neural network in Fig. 1 can be directly
obtained by solve s linear regression problems (Step 4) having as training sets
the samples (x, y) ∈ S with x ∈ Xi , where X1 , . . . Xs are the regions built by
the classification process.

4 Simulation results
The proposed algorithm for piecewise linear regression has been tested on a
one-dimensional benchmark problem, in order to analyze the quality of the
resulting neural network. The unknown function to be reconstructed is the
following

 −x if −4 ≤ x ≤ 0
f (x) = x if 0 < x < 2 (2)

2 + 3x if 2 ≤ x ≤ 4

with X = [−4, 4] and s = 3. A training set S containing m = 100 samples (x, y)


has been generated, where y = f (x) + ε and ε is a normal random variable with
zero mean and variance σ 2 = 0.05. The behavior of f (x) together with the
elements of S are depicted in Fig. 3a.
The method described in Fig. 2 has been applied by choosing at Step 1
the value c = 6. At Step 2 the number s of regions has been supposed to be
known, thus allowing the application of the K-means clustering algorithm [5];
a proper definition of norm has been employed to improve the convergence of
the clustering process [6]. Multicategory classification (Step 3) has then been
performed by using the method described in [1], which can be easily extended
to realize nonlinear boundaries among the Xi when treating a multidimensional
View publication stats

problem. Finally, least square estimation is adopted to generate vectors wi for


piecewise linear regression. The resulting neural network realizes the following
function, represented as a dashed line in Fig. 3b:

 −0.0043 − 0.9787x if −4 ≤ x ≤ −0.24
f (x) = 0.0899 + 0.9597x if −0.24 < x < 2.12

1.8208 + 3.0608x if 2.12 ≤ x ≤ 4

As one can note, this is a good approximation to the unknown function (2).
Errors can only be detected at the boundaries between two adjacent regions
Xi ; they are mainly due to the effect of mixed sets Ck on the classification
process.

References
[1] E. J. Bredensteiner and K. P. Bennett, Multicategory classification
by support vector machines. Computational Optimizations and Applica-
tions, 12 (1999) 53–79.
[2] V. Cherkassky and H. Lari-Najafi, Constrained topological mapping
for nonparametric regression analysis. Neural Networks, 4 (1991) 27–40.
[3] C.-H. Choi and J. Y. Choi, Constructive neural networks with piecewise
interpolation capabilities for function approximation. IEEE Transactions
on Neural Networks, 5 (1994) 936–944.
[4] J. Y. Choi and J. A. Farrell, Nonlinear adaptive control using networks
of piecewise linear approximators. IEEE Transactions on Neural Networks,
11 (2000) 390–401.

[5] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.


(1973) New York: John Wiley and Sons.
[6] G. Ferrari-Trecate, M. Muselli, D. Liberati, and M. Morari,
A Clustering Technique for the Identification of Piecewise Affine Systems.
Accepted at the Fourth International Workshop on Hybrid Systems: Com-
putation and Control, Roma, Italy, March 28-30, 2001.

[7] B. Fritzke, A growing neural gas network learns topologies. In Advances


in Neural Information Processing Systems 7 (1995) Cambridge, MA: MIT
Press, 625–632.
[8] K. Nakayama, A. Hirano, and A. Kanbe, A structure trainable neural
network with embedded gating units and its learning algorithm. In Pro-
ceedings of the International Joint Conference on Neural Networks (2000)
Como, Italy, III–253–258.

You might also like