Mehrkanoon 2012

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

1356 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO.

9, SEPTEMBER 2012

Approximate Solutions to Ordinary


Differential Equations Using Least
Squares Support Vector Machines
Siamak Mehrkanoon, Tillmann Falck, and Johan A. K. Suykens, Senior Member, IEEE

Abstract— In this paper, a new approach based on least squares Many methods have been developed for solving
support vector machines (LS-SVMs) is proposed for solving IVPs of ODEs, such as Runge–Kutta, finite difference,
linear and nonlinear ordinary differential equations (ODEs). predictor-corrector, and collocation methods [3]–[6]. Generally
The approximate solution is presented in closed form by means
of LS-SVMs, whose parameters are adjusted to minimize an speaking, numerical methods for approximating the solution
appropriate error function. For the linear and nonlinear cases, of the BVPs fall into two classes: the difference methods (e.g.,
these parameters are obtained by solving a system of linear shooting method) and weighted residual or series methods.
and nonlinear equations, respectively. The method is well suited In the shooting method, one tries to reduce the problem to
to solving mildly stiff, nonstiff, and singular ODEs with initial IVPs by providing a sufficiently good approximation of the
and boundary conditions. Numerical results demonstrate the
efficiency of the proposed method over existing methods. derivative values at the initial point.
Concerning higher order ODEs, the most common approach
Index Terms— Closed-form approximate solution, collocation is the reduction of the problem to a system of first-order
method, least squares support vector machines (LS-SVMs),
ordinary differential equations (ODEs). differential equations and then solve the system by employing
one of the available methods, which notably has been studied
in the literature, see [4], [7], and [8]. However, as some authors
I. I NTRODUCTION
have remarked, this approach wastes a lot of computer time

D IFFERENTIAL equations can be found in the mathe-


matical formulation of physical phenomena in a wide
variety of applications, especially in science and engineering
and human effort [9], [10].
Most of the traditional numerical methods provide the
solution, in the form of an array, at specific preassigned mesh
[1], [2]. Depending upon the form of the boundary conditions points in the domain (discrete solution) and they need an
to be satisfied by the solution, problems involving ordinary additional interpolation procedure to yield the solution for the
differential equations (ODEs) can be divided into two main whole domain. In order to have an accurate solution, one either
categories, namely initial value problems (IVPs) and boundary has to increase the order of the method or decrease the step
value problems (BVPs). Analytic solutions for these problems size. This, however, increases the computational cost.
are not generally available and hence numerical methods must To overcome these drawbacks, attempts have been made to
be applied. develop new approaches to not only solve the higher order
Manuscript received July 7, 2011; revised May 14, 2012; accepted ODEs directly without reducing it to a system of first-order
May 15, 2012. Date of publication June 22, 2012; date of current differential equations, but also to provide the approximate
version August 1, 2012. This work was supported in part by the solution in closed form (i.e., continuous and differentiable)
Research Council KUL GOA/11/05 Ambiorics, GOA/10/09 MaNet, CoE
EF/05/006 Optimization in Engineering (OPTEC), IOF-SCORES4CHEM, thereby avoiding an extra interpolation procedure. One of these
several Ph.D and Post-Doctoral and Fellowship Grants, the Flemish classes of methods is based on the use of neural network
Government (FWO: Ph.D. and Post-Doctoral Grants), under Project models, see [11]–[17]. Lee and Kang [12] used Hopfield neural
G0226.06 (cooperative systems and optimization), Project G0321.06 (Ten-
sors), Project G.0302.07 (SVM/Kernel), Project G.0320.08 (convex MPC), networks models to solve first-order differential equations. The
Project G.0558.08 (Robust MHE), Project G.0557.08 (Glycemia2), Project authors in [18] introduced a method based on feedforward
G.0588.09 (Brain-Machine), Project G.0377.12 (structured models) research neural networks to solve ordinary and partial differential
communities (WOG: ICCoS, ANMMM, MLDM), Project G.0377.09
(Mechatronics MPC) IWT: Ph.D. Grants, Eureka-Flite+, SBO LeCoPro, equations. In that model, the approximate solution was chosen
SBO Climaqs, SBO POM, O&O-Dsquare, the Belgian Federal Science such that it, by construction, satisfied the supplementary
Policy Office: IUAP P6/04 (DYSCO, Dynamical systems, control and opti- conditions. Therefore, the model function was expressed as a
mization, 2007–2011), EU (ERNSI), FP7-HD-MPC (INFSO-ICT-223854),
COST intelliCIS, FP7-EMBOCON (ICT-248940), Contract Research (AMI- sum of two terms. The first term, which contains no adjustable
NAL), Helmholtz (viCERP), ACCM, Bauknecht, Hoerbiger, and the ERC parameters, satisfied the initial/boundary conditions and the
under Advanced Grant A-DATADRIVE-B. second term involved a feedforward neural network to be
The authors are with the Department of Electrical Engineering
ESAT-SCD-SISTA, Katholieke Universiteit Leuven, Leuven B-3001, trained. An unsupervised kernel least mean square algorithm
Belgium (e-mail: siamak.mehrkanoon@esat.kuleuven.be; tillmann.falck@ was developed for solving ODEs in [19].
esat.kuleuven.be; johan.suykens@esat.kuleuven.be). Despite the fact that the classical neural networks have nice
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. properties, such as universal approximation, they still suffer
Digital Object Identifier 10.1109/TNNLS.2012.2202126 from having two persistent drawbacks. The first problem is the
2162–237X/$31.00 © 2012 IEEE
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1357

existence of many local minima solutions. The second problem A differential equation (1) is said to be stiff when its
is how to choose the number of hidden units. exact solution consists of a steady state term that does not
Support vector machines (SVMs) are a powerful method grow significantly with time, together with a transient term
for solving pattern recognition and function estimation prob- that decays exponentially to zero. Problems involving rapidly
lems [20], [21]. In this method, one maps data into a decaying transient solutions occur naturally in a wide variety
high-dimensional feature space and there solves a linear of applications, including the study of damped mass spring
regression problem. It leads to solving quadratic programming system and the analysis of control systems (see [4] for more
problems. least squares (LS)-SVMs for function estimation, details).
classification, problems in unsupervised learning and others If the coefficient functions f  (t) of (1) fail to be analytic at
has been investigated in [22]–[26]. In this case, the problem point x = a, then (1) is called singular ODE.
formulation involves equality instead of inequality constraints. The approaches given in [18] and [19], define a trial
The training for regression and classification problems is then solution to be a sum of two terms i.e., y(t) = H (t) +
done by solving a set of linear equations. It is the purpose of F(t, N(t, P)). The first term H (t), which has to be defined by
this paper to introduce a new approach based on LS-SVMs the user and in some cases is not straightforward, satisfies the
for solving ODEs. initial/boundary conditions, and the second term F(t, N(t, P))
This paper uses the following notation. Vector-valued vari- is a single-output feedforward neural network with input t
ables are denoted in lowercase boldface, whereas variables and parameters P. In contrast with the approaches given
that are neither boldfaced nor capitalized are scalar valued. in [18] and [19], we build the model by incorporating the
Matrices are denoted in capital. Euler Script (euscript) font is initial/boundary conditions as constraints of an optimization
used for operators. problem. This significantly reduces the burden placed on the
This paper is organized as follows. In Section II, the user as a potentially difficult problem is handled automatically
problem statement is given. In Section III, we formulate by the proposed technique.
our LS-SVMs method for the solution of linear differential
equations. Section IV is devoted to the formulation of the
method for nonlinear first-order ODEs. Model selection and A. LS-SVM Regression
the practical implementation of the proposed method are Let us consider a given training set {x i , yi }i=1
N
with input
discussed in Section V. Section VI describes the numerical data x i ∈ R and output data yi ∈ R. For the purpose of
experiments, discussion, and comparison with other known this paper, we only use an 1-D input space. The goal in a
methods. regression problem is to estimate a model of the form y(x) =
w T ϕ(x) + b.
II. P ROBLEM S TATEMENT The primal LS-SVM model for regression can be written as
This section describes the problem statement. In follows [23]:
Section II-A, a short introduction to LS-SVMs for regression 1 T γ
is given to highlight the difference to the problem considered minimize w w + eT e
w,b,e 2 2
in this paper. Finally, some operators that will be used in the
s.t. yi = w ϕ(x i ) + b + ei , i = 1, . . . , N
T
(2)
following sections are defined.
Consider the general m-th order linear ODE with time where γ ∈ R+ , b ∈ R, w ∈ Rh . ϕ(·) : R → Rh is the feature
varying coefficients of the form map and h is the dimension of the feature space. The dual

m solution is then given by
L[y] ≡ f  (t)y () (t) = r (t) t ∈ [a, c] (1) ⎡ ⎤
   
 + I N /γ 1 N
⎦ α = y
=0

where L represents an m-th order linear differential operator, 1N T 0 b 0
[a, c] is the problem domain, and r (t) is the input signal. f (t)
are known functions and y ()(t) denotes the -th derivative of where i j = K (x i , x j ) = ϕ(x i )T ϕ(x j ) is the i j -th entry
y with respect to t. The m − 1 necessary initial or boundary of the positive definite kernel matrix. 1 N = [1, . . . , 1]T ∈
conditions for solving the above differential equations are R N , α = [α1 , . . . , α N ]T , y = [y1 , . . . , y N ]T and I N is
IVP the identity
N matrix. The model in the dual form becomes:
ICμ [y(t)] = pμ , μ = 0, . . . , m − 1 y(x) = i=1 αi K (x, x i ) + b. It should be noted that if b = 0,
for an explicitly known and finite dimensional feature map ϕ
BVP
the problem could be solved in primal (ridge regression) by
BCμ [y(t)] = qμ , μ = 0, . . . , m − 1
eliminating e and then w would be the only unknown. But
where ICμ are the initial conditions (all constraints are applied in the LS-SVM approach, the feature map ϕ is not explicitly
at the same value of the independent variable i.e., t = a) and known in general and can be infinite dimensional. Therefore,
BCμ are the boundary conditions (the constraints are applied the kernel trick is used and the problem is solved in dual
at multiple values of the independent variable t, typically at [22]. When we deal with differential equations, the target
the ends of the interval [a, c] in which the solution is sought). values yi are not directly available, so the regression approach
pμ and qμ are given scalars. does not directly apply. Nevertheless, we can incorporate the
1358 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2012

underlying differential equation in the learning process to find For notational convenience, let us list the following notations,
an approximation for the solution. which are used in the following sections:
Let us assume an explicit model ŷ(t) = w T ϕ(t) + b as ⏐

an approximation for the solution of the differential equation. [∇n K ](t, s) = [∇n K (u, v)]⏐
m m

Since there are no data available in order to learn from the ⏐
u=t,v=s

differential equation, we have to substitute our model into ⏐ ∂ n+m K (u, v) ⏐
[m
n ]i, j = ∇nm [K (u, v)]⏐
⏐ = ⏐
the given differential equation. Therefore, we need to define
u=ti ,v=t j ∂u n ∂v m ⏐u=ti ,v=t j
the derivative of the kernel function. Making use of Mercer’s ⏐

theorem [21], derivatives of the feature map can be written [00 ]i, j = ∇00 [K (u, v)]⏐
⏐ = K (ti , t j )
in terms of derivatives of the kernel function [27]. Let us u=ti ,v=t j
define the following differential operator, which will be used
where [m n ]i, j denotes the (i, j )-th entry of matrix n . The
m
in subsequent sections:
notation Mk:l,m:n is used for selecting a submatrix of matrix
∂ n+m M consisting of rows k to l and columns m to n. Mi,: denotes
∇nm ≡ . (3)
∂u n ∂v m the i -th row of matrix M. M:, j denotes the j -th column of
matrix M.
If ϕ(u)T ϕ(v) = K (u, v), then one can show that
[ϕ (n) (u)]T ϕ (m) (v) = ∇nm [ϕ(u)T ϕ(v)] A. First-Order IVP
∂ n+m K (u, v) As a first example, consider the following first-order IVP:
= ∇nm [K (u, v)] = . (4)
∂u n ∂v m
y  (t) − f 1 (t)y(t) = r (t), y(a) = p1 , a ≤ t ≤ c. (7)
Using (4), it is possible to express all derivatives of the feature
map in terms of the kernel function itself (provided that the In the LS-SVM framework, the approximate solution can
kernel function is sufficiently differentiable). For instance, the be obtained by solving the following optimization problem:
following relations hold:
1 T γ
minimize w w + eT e
∂(ϕ(u)T ϕ(v)) w,b,e 2 2
∇10 [K (u, v)] = = ϕ (1) (u)T ϕ(v)  
∂u T 
s.t. w ϕ (ti ) = f1 (ti ) w ϕ(ti ) + b
T
∂(ϕ(u)T ϕ(v))
∇01 [K (u, v)] = = ϕ(u)T ϕ (1) (v)
∂v r (ti ) + ei , i = 2, . . . , N
∂ 2 (ϕ(u)T ϕ(v)) w T ϕ(t1 ) + b = p1 .
∇20 [K (u, v)] = = ϕ (2) (u)T ϕ(v). (8)
∂u 2
This problem is obtained by combining the LS-SVM cost func-
III. F ORMULATION OF THE M ETHOD FOR THE tion with constraints constructed by imposing the approximate
L INEAR ODE C ASE solution ŷ(t) = w T ϕ(t) + b, given by the LS-SVM model,
to satisfy the given differential equation with corresponding
Let us assume that a general approximate solution to (1) is
initial condition at collocation points {ti }i=1
N . Problem (8) is
of the form ŷ(t) = w T ϕ(t) + b, where w and b are unknowns
a quadratic minimization under linear equality constraints,
of the model that have to be determined. To obtain the optimal
which enables an efficient solution.
value of these parameters, collocation methods can be used
Lemma 3.1: Given a positive definite kernel function
[28], which assume a discretization of the interval [a, c] into
K : R× R → R with K (t, s) = ϕ(t)T ϕ(s) and a regular-
a set of collocation points ϒ = {a = t1 < t2 < · · · < t N = c}.
ization constant γ ∈ R+ , the solution to (8) is obtained by
Therefore, the w and b are to be found by solving the
solving the following dual problem:
following optimization problem.
⎡ ⎤
For the IVP Case: K + I N−1 /γ h p1 − f1 ⎡ α ⎤ ⎡ r ⎤
N  2 ⎢ ⎥
1 ⎢
⎣ h p1 T 1 1 ⎥ ⎣ ⎦ ⎣ ⎦
⎦ β = p1 (9)
minimize (L[ ŷ] − r )(ti )
2 b 0

i=1 − f1 T 1 0
s.t. ICμ [ ŷ(t)] = pμ , μ = 0, . . . , m − 1. (5)
with
For the BVP Case:
α = [α2 , . . . , α N ]T , f1 = [ f 1 (t2 ), . . . , f 1 (t N )]T ∈ R N−1
N 
 2
1 r = [r (t2 ), . . . , r (t N )]T ∈ R N−1
minimize (L[ ŷ] − r )(ti )
ŷ 2
i=1 K= ˜ 11 − D1  ˜ 01 −  ˜ 00 D1
˜ 10 D1 + D1 
s.t. BCμ [ ŷ(t)] = qμ , μ = 0, . . . , m − 1 (6) h p1 = [10 ]1,2:N
T
− D1 [00 ]1,2:N
T
.
where N is the number of collocation points (which is equal D1 is a diagonal matrix with the elements of f1 on the main
to the number of training points) used to undertake the learn- diagonal. [m m m ˜m
n ]1,2:N = [[n ]1,2 , . . . , [n ]1,N ] and n =
ing process. In what follows we formulate the optimization [n ]2:N,2:N for n, m = 0, 1. Also note that K ∈ R
m (N−1)×(N−1)

problem in the LS-SVM framework for solving linear ODEs. and h p1 ∈ R N−1 .
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1359

Proof: The Lagrangian of the constrained optimization The approximate solution, ŷ(t) = w T ϕ(t)+b, is then obtained
problem (8) becomes by solving the following optimization problem:
1 γ 1 T γ
L(w, b, ei , αi , β) = w T w + eT e minimize w w + eT e
2 2 w,b,e 2 2

N  
 s.t. w T ϕ  (ti ) = f1 (ti )w T ϕ  (ti ) +
− αi w ϕ (ti ) − f 1 (ti )ϕ(ti )
T

i=2 f 2 (ti )[w T ϕ(ti ) + b] + r (ti ) + ei , i = 2, . . . , N


 
w T ϕ(t1 ) + b = p1
− f 1 (ti )b − ri − ei − β w T ϕ(t1 ) + b − p1
w T ϕ  (t1 ) = p2 . (11)
where {αi }i=2
N
and β are Lagrange multipliers and ri = r (ti ) Lemma 3.2: Given a positive definite kernel function
for i = 2, . . . , N. Then the Karush–Kuhn–Tucker (KKT) K : R× R → R with K (t, s) = ϕ(t)T ϕ(s) and a regular-
optimality conditions are as follows: ization constant γ ∈ R+ , the solution to (11) is obtained by
N  solving the following dual problem:
∂L 
=0→ w= αi ϕ (ti ) − f1 (ti )ϕ(ti ) + βϕ(t1 ) ⎡ ⎤⎡ ⎤ ⎡ ⎤
∂w K + I N−1 /γ h p1 h p2 − f2 α r
i=2
⎢ ⎥⎢ ⎥ ⎢ ⎥

N ⎢ T
1 ⎥ ⎢ β1 ⎥ ⎢ p 1 ⎥
⎥ ⎢ ⎥ ⎢
∂L ⎢ h p1 1 0 ⎥
=0→ αi f 1 (ti ) − β = 0 ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ (12)
∂b ⎢ h T
0 [ 1] 0 ⎥ ⎢ β ⎥ ⎢ p ⎥
i=2 ⎣ p2 1 1,1 ⎦⎣ 2 ⎦ ⎣ 2 ⎦
∂L αi − f2 T 1 0 0 b 0
= 0 → ei = − i = 2, . . . , N
∂ei γ
 where
∂L T 
= 0 → w ϕ (ti ) − f 1 (ti )ϕ(ti ) − f 1 (ti )b
∂αi α = [α2 , . . . , α N ]T , f1 = [ f 1 (t2 ), . . . , f 1 (t N )]T ∈ R N−1
∂L f2 = [ f 2 (t2 ), . . . , f 2 (t N )]T ∈ R N−1
−ei = ri , i = 2, . . . , N, = 0 → w T ϕ(t1 ) + b = p1 .
∂β r = [r (t2 ), . . . , r (t N )]T ∈ R N−1
After elimination of the primal variables w and {ei }i=2
N
and K= ˜ 22 − D1  ˜ 12 − D2  ˜ 02 − 
˜ 21 D1 −  ˜ 20 D2
making use of Mercer’s theorem, the solution is given in the +D1  ˜ 1 D1 + D1 
1 ˜ 0 D2 + D2 
1 ˜ 1 D1 + D2 
0 ˜ 00 D2
dual by
⎧ h p1 = [20 ]1,2:N
T
− D1 [10 ]1,2:N
T
− D2 [00 ]1,2:N
T
⎪  N  

⎪ h p2 = [21 ]1,2:N
T
− D1 [11 ]1,2:N
T
− D2 [01 ]1,2:N
T
.

⎪ r i = α j [ 1
]
1 j,i − f (t
1 i ) [ 0
]
1 j,i − f (t
1 j )[ 0
]
0 j,i



⎪ j =2  

⎪ D1 and D2 are diagonal matrices with the elements of



⎪ − f 1 (t j )[10 ] j,i + β [10 ]1,i − f1 (ti )[00 ]1,i f1 and f2 on the main diagonal, respectively. Note that

⎪ K ∈ R(N−1)×(N−1) and h p1 , h p2 ∈ R N−1 . [m
⎨ n ]1,2:N =
+ αγi − f 1 (ti )b, i = 2, . . . , N [[m ] , . . . , [ m] ] = = 2. ˜m
 n 1,2 n 1,N for n 0, 1 and m 0, 1, n =

⎪  N
[n ]2:N,2:N for m, n = 0, 1, 2.
m



⎪ p 1 = α j [ 0
] j,1 − f 1 (t j )[ 0
] j,1 + β [00 ]1,1 + b


1 0 Proof: Consider the Lagrangian of (11)

⎪ j =2



⎪ N L(w, b, ei , αi , β1 , β2 )
⎪0 =
⎪ α j f 1 (t j ) − β 

⎩ N
1 T γ
j =2 = w w + eT e − αi w T ϕ  (ti ) − f 1 (ti )ϕ  (ti )
2 2
and writing these equations in matrix form gives the linear  i=2
 
system in (9).
− f 2 (ti )ϕ(ti ) − f 2 (ti )b − ri − ei −β1 w ϕ(t1 ) + b − p1
T
The model in the dual form becomes

N  T 
0 0 −β2 w ϕ (t1 ) − p2 (13)
ŷ(t) = αi [∇1 K ](ti , t) − f 1 (ti )[∇0 K ](ti , t)
i=2
+β [∇00 K ](t1 , t) + b (10) where {αi }i=2
N , β and β are Lagrange multipliers. The KKT
1 2
optimality conditions are as follows:
where K is the kernel function. 
∂L 
N
=0→ w= αi ϕ i − f 1 (ti )ϕ i − f 2 (ti )ϕ i
B. Second-Order IVP and BVP ∂w
i=2
1) IVP Case: Let us consider a second-order IVP of the ∂L 
N
form +β1 ϕ 1 + β2 ϕ 1 =0→ αi f 2 (ti ) − β1 = 0
∂b
i=2
y  (t) = f 1 (t)y  (t) + f2 (t)y(t) + r (t) t ∈ [a, c] ∂L αi
= 0 → ei = − , i = 2, . . . ,
y(a) = p1 , y  (a) = p2 . ∂ei γ
1360 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2012


∂L
N = 0 → w T ϕ i − f 1 (ti )ϕ i − f 2 (ti )ϕ i 2) BVP Case: Consider the second-order BVP of ODEs of
∂αi the form
− f 2 (ti )b − ei = ri ,
∂L y  (t) = f 1 (t)y  (t) + f 2 (t)y(t) + r (t) t ∈ [a, c]
i = 2, . . . , N = 0 → w T ϕ 1 + b = p1
∂β1 y(a) = p1 , y(c) = q1 .
∂L
= 0 → w T ϕ 1 = p2 Then the parameters of the closed-form approximation of the
∂β2
solution can be obtained by solving the following optimization
where ϕ i = ϕ(ti ), ϕ i = ϕ  (ti ) and ϕ i = ϕ  (ti ) for i = problem:
1, . . . , N. 1 T γ
Applying the kernel trick and eliminating w and {ei }i=2 N minimize w w + eT e
w,b,e 2 2
leads to s.t. w T ϕ  (ti ) = f 1 (ti )w T ϕ  (ti )

⎪  N  + f 2 (ti )[w T ϕ(ti ) + b] + r (ti ) + ei



⎪ r = α [22 ] j,i i = 2, . . . , N − 1


i j

⎪ j =2 

⎪ w T ϕ(t1 ) + b = p1



⎪ − f (t ) [ 1] − f (t )[ 1] − f (t )[ 1]
w T ϕ(t N ) + b = q1 . (14)


1 i

2 j,i 1 j 1 j,i 2 j 0 j,i






⎨ − f (t
2 i ) [ 0]
2 j,i − f (t
1 j )[ 0]
1 j,i − f (t
2 j )[ 0]
0 j,i
The same procedure can be applied to derive the Lagrangian
 and afterward the KKT optimality conditions. Then, one can

⎪ − f 1 (t j )[21 ] j,i − f 2 (t j )[20 ] j,i show that the solution to (14) is obtained by solving the



⎪  following linear system:



⎪ +β [ 2] − (t )[ 1] − (t )[ 0]
⎡ ⎤⎡ ⎤ ⎡ ⎤

⎪ 1 0 1,i f 1 i 0 1,i f 2 i 0 1,i

⎪  K + I N−2 /γ h p1 h q1 − f2 α r

⎪ ⎢ ⎥⎢ ⎥ ⎢ ⎥

⎪ +β [ 2] − (t )[ 1] − (t )[ 0] ⎢



2 1 1,i f 1 i 1 1,i f 2 i 1 1,i ⎢ h p1 T 1 [00 ] N,1 1 ⎥ ⎢ ⎥ ⎢ ⎥
⎥ ⎢ β1 ⎥ ⎢ p 1 ⎥

⎩ αi ⎢ ⎥⎢ ⎥=⎢ ⎥
+ γ − f 2 (ti )b, i = 2, . . . , N ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ h q1 T [00 ]1,N 1 1 ⎥ ⎢ β2 ⎥ ⎢ q 1 ⎥
⎧  ⎣ ⎦⎣ ⎦ ⎣ ⎦

⎪  N
− f2 T

⎪ = α [ 0
] − (t )[ 0
] − (t )[ 0
]
1 1 0 b 0

⎪ 1
p j 2 j,1 f 1 j 1 j,1 f 2 j 0 j,1



⎪ j =2
where

⎪ +β1 [00 ]1,1 + β2 [01 ]1,1 + b



⎪ 
⎨  N α = [α2 , . . . , α N−1 ]T
p2 = α j [2 ] j,1 − f 1 (t j )[1 ] j,1 − f 2 (t j )[0 ] j,1
1 1 1

⎪ f1 = [ f 1 (t2 ), . . . , f1 (t N−1 )]T ∈ R N−2

⎪ j =2

⎪ +β1 [10 ]1,1 + β2 [11 ]1,1 f2 = [ f 2 (t2 ), . . . , f2 (t N−1 )]T ∈ R N−2





⎪  N r = [r (t2 ), . . . , r (t N−1 )]T ∈ R N−2



⎪ 0 = α j f 2 (t j ) − β1 .
⎩ K= ˜ 12 − D2 
˜ 22 − D1  ˜ 02 − 
˜ 21 D1 − 
˜ 20 D2
j =2
˜ 11 D1 + D1 
+D1  ˜ 10 D2 + D2 
˜ 01 D1 + D2 
˜ 00 D2
Finally, writing these equations in matrix form will result in
h p1 = [20 ]1,2:N−1
T
− D1 [10 ]1,2:N−1
T
− D2 [00 ]1,2:N−1
T
the linear system (12).
The LS-SVM model for the solution and its derivative in h q1 = [20 ]TN,2:N−1 − D1 [10 ]TN,2:N−1 − D2 [00 ]TN,2:N−1 .
the dual form becomes
D1 and D2 are diagonal matrices with the elements

N of f1 and f2 on the main diagonal, respectively. Note
ŷ(t) = αi [∇20 K ](ti , t) − f1 (ti )[∇10 K ](ti , t) that K ∈ R(N−2)×(N−2) and h p1 , h q1 ∈ R N−2 .
i=2 [mn ]1,2:N−1 = [[n ]1,2 , . . . , [n ]1,N−1 ] and [n ] N,2:N−1 =
m m m

[[n ] N,2 , . . . , [n ] N,N−1 ] for n = 0, 1 and m = 0, 1, 2.
m m
− f 2 (ti )[∇00 K ](ti , t) + β1 [∇00 K ](t1 , t) ˜m
n = [n ]2:N−1,2:N−1 for m, n = 0, 1, 2.
m

The LS-SVM model for the solution and its derivative are
+β2 [∇10 K ](t1 , t) + b expressed in dual form as
N
d ŷ(t) 
N−1
= αi [∇21 K ](ti , t) − f 1 (ti )[∇11 K ](ti , t)
dt ŷ(t) = αi [∇20 K ](ti , t) − f 1 (ti )[∇10 K ](ti , t)
i=2
 i=2

− f 2 (ti )[∇0 K ](ti , t) + β1 [∇01 K ](t1 , t)
1
− f 2 (ti )[∇00 K ](ti , t) + β1 [∇00 K ](t1 , t)
+β2 [∇11 K ](t1 , t). +β2 [∇00 K ](t N , t) + b
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1361


N−1
{Di }m
i=1 are diagonal matrices with the elements of { f i }i=1 on
m
d ŷ(t)
= αi [∇21 K ](ti , t) − f 1 (ti )[∇11 K ](ti , t) ¯ ¯
dt the main diagonal, respectively. Also 1 , , and D̄ are block
i=2
 matrices. ˜m
n = [n ]2:N,2:N . Note that K ∈ R
m (N−1)×(N−1) .

− f 2 (ti )[∇01 K ](ti , t) + β1 [∇01 K ](t1 , t) Proof: The Lagrangian for (16) is given by
L(w, b, ei , αi , βi )
+β2 [∇01 K ](t N , t). 
1 T γ N m
= w w + eT e − αi w T ϕ m (ti ) − f k (ti )
C. m-th Order Linear ODE 2 2
i=2 k=1
  
Let us now consider the general m-th order IVP of the (m−k)
ϕ (ti ) − f m (ti )b − ri − ei −β1 w ϕ(t1 )+b − p1
T
following form:
 

m
T 
−β2 w ϕ (t1 ) − p2 − · · · − βm w ϕ T m−1
(t1 ) − pm .
y (m) (t) − f i (t)y (m−i) (t) = r (t) t ∈ [a, c]
i=1
 Eliminating w and {ei }i=2 N
from the corresponding KKT opti-
y(a) = p1
(15) mality conditions yields the following set of equations:
y (i−1) (a) = pi , i = 2, . . . , m. ⎧
⎪  N 
The approximate solution can be obtained by solving the ⎪


⎪ ri = α j [m m ] j,i
following optimization problem: ⎪


⎪ j =2 


1 T γ ⎪
⎨ m m m−
minimize w w + eT e − =1 f  (ti ) [m ] j,i − k=1 f k (t j )[m−k ] j,i
m−
w,b,e 2 2
  ⎪
⎪ − m (t j )[m
⎪ k=1 f k m−k ] j,i
m

⎪ 
s.t. w T ϕ (m) (ti ) = w T f k (ti )ϕ (m−k) ⎪
⎪ m
i ⎪

m
+ =1 β [−1]1,i − k=1 f k (ti )[−1 ]1,i
m m−k
k=1 ⎪



+ f m (ti )b + r (ti ) + ei + αγi − f m (ti )b, i = 2, . . . , N
i = 2, . . . , N ⎧ 

⎪  N m
w ϕ(t1 ) + b = p1
T ⎪
⎪ p = α [ 0
] − f (t )[ 0
]

⎪ 1 j m j,1 k j m−k j,1


w T ϕ (i−1) (t1 ) = pi , i = 2, . . . , m. (16) ⎪


j =2
k=1

⎪ + m k=1 βk [k−1 ]1,1 + b
0

⎪ .
Lemma 3.3: Given a positive definite kernel function K : ⎪
⎪ ..


R × R → R with K (t, s) = ϕ(t)T ϕ(s) and a regularization 
constant γ ∈ R+ , the solution to (16) is obtained by solving  N m

⎪ p = α [ m−1
] − f (t )[ m−1
]
the following dual problem: ⎪

m j m j,1 k j m−k j,1

⎪ j =2
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎪
⎪ k=1
⎪ + m m−1
K + I N−1 /γ Kin − fm α r ⎪
⎪ k=1 βk [k−1 ]1,1
⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎪

⎢ ⎪
⎪ N
⎣ Kin
T C ⎥ ⎢ ⎥ ⎢ ⎥
⎦⎣β ⎦ = ⎣ p⎦ (17) ⎪

⎪ 0 = α j f m (t j ) − β1 .

− fm T CT 0 b 0 j =2

with Rewriting the above system in matrix form will result


in (17).
fk = [ f k (t2 ), . . . , fk (t N )]T ∈ R N−1 , k = 1, . . . , m The LS-SVM model for the solution and its derivatives can
 m be expressed in dual form as follows:
h p = [m ] T
−1 1,2:N − Dk [m−k T
−1 ]1,2:N ,  = 1, . . . , m 
k=1

N 
m
0 0
ŷ(t) = αi [∇m K ](ti , t) − f k (ti )[∇m−k K ](ti , t)
Kin = [h p1 , . . . , h pm ] ∈ R(N−1)×m , h p ∈ R(N−1) i=2 k=1
p = [ p1, . . . , pm ]T ∈ Rm , C = [1, 0, . . . , 0]T ∈ Rm 
m
+ βk [∇k−1
0
K ](t1 , t) + b
r = [r (t2 ), . . . , r (t N )]T ∈ R N−1 , α = [α2 , . . . , α N ]T
k=1
β = [β1 , . . . , βm ]T ,  ¯ 1 = [
˜ 0m , . . . , 
˜ m−1
m ]
T
⎡ ⎤ dp ŷ(t) 
N 
m 
˜ 0 ··· 
 ˜0 = αi
p
[∇m K ](ti , t) −
p
f k (ti )[∇m−k K ](ti , t)
0 m−1
dt p
D̄ = [Dm , . . . , D1 ],  ¯ =⎢ .
⎣ .. . . . .. ⎦
. ⎥ i=2 k=1

m
˜ m−1 · · ·  ˜ m−1 +
p
βk [∇k−1 K ](t1 , t) p = 1, . . . , m − 1.
0 m−1
˜ m − D̄ 
K= ¯1− ¯ 1T D̄ T + D̄ 
¯ D̄ T k=1
⎡m 0 ⎤ Lemma 3.4: The matrix K is positive semi-definite.
[0 ]1,1 · · · [0m−1 ]1,1
⎢ .. .. .. ⎥ Proof: Let D̃ = [ D̄, −I N−1 ] and
=⎢
⎣ .  . . ⎥
⎦ .  
¯ 
 ¯1
0m−1 m−1
· · · [m−1 ]1,1 ˜
= ¯T m .
1,1 m×m 1 m
1362 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2012

⎡ ⎤ ⎡ ⎤
Then the matrix K can be written as D̃  ˜ D̃ T . To show that α f ( y)
˜ x̃ ≥ 0 ⎢η⎥ ⎢
x Kx ≥ 0 for any x, it is sufficient to show that x̃ T  ⎢ ⎥ ⎢ 0 N−1 ⎥
T
⎢ ⎥ ⎢ ⎥
for any x̃ because x = D̃ x̃ is included as a special case. Now ⎢ β ⎥ = ⎢ p1 ⎥ ⎥ (20)
⎢ ⎥ ⎣ ⎦
consider matrices of evaluations of the feature map and its ⎣b⎦ 0
derivatives
(i) = [ϕ (i) (t2 ), . . . , ϕ (i) (t N )] for i = 0, . . . , m−1 0 N−1
˜ = [
(0) , . . . ,
(m−1) ]. y
and denote their concatenation as

˜ ˜ ˜ ˜
Then
x̃ 2 = x̃

x̃ = x̃ x̃ holds, where the last
2 T T T
where
equality follows from an application of the kernel trick. With
the property that the norm of any vector is a nonnegative real 11 = 
 ˜ 11 + I N−1 /γ , 
00 = 
˜ 00 + I N−1 /γ
number,
˜ x̃ 2 is greater than or equal to zero. Therefore,
also its squared form x̃ T  ˜ x̃ is nonnegative, which concludes D( y) = diag( f  ( y))
the proof. f ( y) = [ f (t2 , y2 ), . . . , f (t N , y N )]T
 ⏐ ⏐ 
 ∂ f (t, y) ⏐⏐ ∂ f (t, y) ⏐

f ( y) = ,...,
IV. F ORMULATION OF THE M ETHOD FOR THE ∂y ⏐t =t2 ,y=y2 ∂y ⏐t =t N ,y=y N
N ONLINEAR ODE C ASE α = [α2 , . . . , α N ]T , η = [η2 , . . . , η N ]T
In this section, we formulate an optimization problem based ˜ 00 = [00 ]2:N,2:N
y = [y2 , . . . , y N ]T , 
on LS-SVMs for solving nonlinear first-order ODEs of the ˜ 11 = [11 ]2:N,2:N , 
 ˜ 10 = [10 ]2:N,2:N
following form:  
h0 = [00 ]1,2 , . . . , [00 ]1,N
y  = f (t, y), y(a) = p1 , a ≤ t ≤ c. (18)  
h1 = [10 ]1,2 , . . . , [10 ]1,N
One starts with assuming the approximate solution to be
of the form ŷ(t) = w T ϕ(t) + b. Additional unknowns yi are 0 N−1 = [0, . . . , 0]T ∈ R N−1 .
introduced to keep the constraints linear in w. This yields the
following nonlinear optimization problem: The nonlinear system (20), which consists of 3N −1 equations
with 3N − 1 unknowns (α, η, β, b, y), is solved by Newton’s
1 T γ γ method. The model in the dual form becomes
minimize w w + eT e + ξ T ξ
w,b,e,ξ ,yi 2 2 2

N 
N
s.t. w T ϕ  (ti ) = f (ti , yi ) + ei , i = 2, . . . , N ŷ(t) = αi [∇10 K ](ti , t) + ηi [∇00 K ](ti , t)
T
w ϕ(t1 ) + b = p1 i=2 i=2
yi = w T ϕ(ti ) + b + ξi , i = 2, . . . , N. (19) +β [∇00 K ](t1 , t) + b. (21)

The Lagrangian of the constrained optimization problem


(19) becomes V. P RACTICAL I MPLEMENTATION AND M ODEL S ELECTION
1 γ γ
L(w, b, ei , ξi , yi , αi , ηi , β) = w T w + eT e + ξ T ξ A. Solution on a Long Time Interval
2 2 2

N  Consider now the situation where a given differential equa-
− αi w T ϕ  (ti ) − f (ti , yi ) − ei tion has to be solved for a large time interval [a, c]. It should
i=2 be noted that in order to improve the accuracy (or maintain the

same order of accuracy on the whole domain), we then need
−β w T ϕ(t1 ) + b − p1 to increase the number of collocation points. This approach,

N  however, leads to a larger system of equations.
T In order to implement the proposed method for solving
− ηi yi − w ϕ(ti ) − b − ξi .
i=2 problems involving large time intervals efficiently, the domain
decomposition technique is applied [29]. At first, the domain
After obtaining KKT optimality conditions, and elimination = [a, c] is decomposed into S segments as = k=1 S
k .
of the primal variables w, {ei }i=2
N
and {ξi }i=2
N
and making use We assume that the approximate solution on the k-th segment
of Mercer’s theorem, the solution is obtained in the dual by has the form ŷk (t) = wkT ϕ(t)+bk . Then the problem is solved
solving the following nonlinear system of equations: in each sub-domain k using the described method in previous
⎡  ⎤ sections. The computed approximate solution at the final point
11 ˜1
 h1 T 0 N−1 0(N−1)×(N−1)

0
⎥ in the sub-domain k is used as starting point (initial value)
⎢ ( ⎥
⎢ ˜ 0) 
1 T 0 h0 T 1 N−1 −I N−1
0 ⎥ for the consecutive sub-domain k+1 .
⎢ ⎥
⎢ h1 h [ 0] 1 0 T ⎥ Utilizing this approach will result in solving S smaller
⎢ 0 0 1,1 N−1 ⎥
⎢ ⎥ systems of equations, which is computationally more efficient
⎢ 0T T T ⎥
⎣ N−1 1 N−1 1 0 0 N−1 ⎦ than solving a very large system of equations obtained by con-
D( y) I N−1 0 N−1 0 N−1 0(N−1)×(N−1) sidering the whole domain (with the same total number of
collocation points). The procedure is outlined in Algorithm 1.
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1363

TABLE I
Algorithm 1 Approximating the Solution on a Large Interval
N UMERICAL R ESULTS OF THE P ROPOSED M ETHOD FOR
1: Decompose the domain = [a, c] into S sub-domains.
S OLVING P ROBLEMS 2–4
2: set  = (c − a)/S, tin := a, yin := p1 , t f := tin + .
3: for k = 1 to S do Problem Domain  y − ŷ∞ MSE STD
4: Obtain a LS-SVM model for the k-th segment [tin , t f ] 2 Inside 4.56 × 10−3 1.47 × 10−6 1.16 × 10−3
i.e., ŷk (t) = wkT ϕ k (t) + bk . Outside 4.62 × 10−1 3.85 × 10−2 1.56 × 10−1
5: set tin := t f , yin := ŷ(t f ), t f := tin +  3 Inside 5.43 × 10−3 8.94 × 10−6 1.60 × 10−3
6: end for Outside 8.46 × 10−2 1.49 × 10−3 2.27 × 10−2
7: For a given test point t: 4 Inside 1.46 × 10−4 8.15 × 10−9 3.90 × 10−5
1) check to which segment it belongs; Outside 6.76 × 10−2 5.53 × 10−4 2.20 × 10−2
2) use the corresponding model to compute the approx- Note: STD is the standard deviation.
imate solution at the given point.
so the following relations hold:
2(u − v)
∇10 [K (u, v)] = − K (u, v)
B. Parameter Tuning σ2
2(u − v)
The performance of the LS-SVM model depends on the ∇01 [K (u, v)] = K (u, v)
 σ
2
choice of the tuning parameters. In this paper, for all experi- 
ments, the Gaussian RBF kernel is used. Therefore, a model is 4(u − v)2 2
∇2 [K (u, v)] =
0
− 2 K (u, v).
determined by the regularization parameter γ and the kernel σ4 σ
bandwidth σ . MATLAB 2010b is used to implement the code and all compu-
It should be noted that unlike the regression case, we do tations were carried out on a windows 7 system with Intel-core
not have target values, and consequently we do not have i7 CPU and 4.00 GB RAM.
noise. Therefore, a quite large value should be taken for
the regularization constant γ so that the error e is sharply
A. First-Order ODEs
minimized or equivalently the constraints are well satisfied.
In all the experiments, the chosen value for γ was 1010 , Problem 1: Consider the following first-order ODE
except for the problem with large interval for which γ is set [19, eq. 2]:
to 105 in order to avoid ill conditioning. d
y(t) + 2y(t) = sin(t) y(0) = 1, t ∈ [0, 10].
Therefore, the only parameter left that has to be tuned is dt
the kernel bandwidth. In this paper, the optimal values of σ The approximate solution obtained by the proposed method
are obtained by evaluating the performance of the model on is compared with the true solution and results are depicted
a validation set using a meaningful range of possible (σ ). in Fig. 1. From the obtained results, it is apparent that our
The validation set is defined to be the set of midpoints method outperforms the method in [19] in terms of accu-
V ≡ {v i = (ti + ti+1 )/2, i = 1, . . . , N − 1} where {ti }i=1
N
are racy (see [19, Fig. 6]), although training was performed using
training points. The values that minimize the mean squared fewer points (one fourth). In addition, we also considered
error (MSE) on this validation set are then selected. points outside the training interval, and Fig. 1(d) and (e) shows
Remark 5.1: In some cases, an extremely large value for γ , that the extrapolation error remains low for the points near the
normally greater than 107 , can make the matrix in (9) close domain of equation. As expected, by increasing the number of
to singular. mesh points (training points), the error decreases both inside
and outside of the training interval. Fig. 1(c) and (f) indicates
VI. N UMERICAL R ESULTS the performance of the method when nonuniform partitioning
is used for creating training points.
In this section, we have tested the performance of the Problem 2: First-order differential equation with nonlinear
proposed method on seven problems, four first-order and sinusoidal excitation [19, eq. 3]
three second-order ODEs. For the first three problems and 
d t
Problem 5, a comparison is made between the solutions y(t) + 2y(t) = t sin
3
y(0) = 1, t ∈ [0, 10].
obtained in [19] and our computed solutions. The numerical dt 2
results of the Problems 4 and 6 are compared with those given The interval [0, 10] is discretized into N = 20 points t1 =
in [18]. Problem 7, which has no analytic solution and is 0, . . . , t20 = 10 using the grid ti = (i − 1)h, i = 1, . . . , N,
a singular problem, is solved and the computed solution is where h = (10/N − 1). In Fig. 2(a), we compare the exact
compared with that reported in [30]. In order to show the solution with the computed solution at grid points (circles)
approximation and generalization capabilities of the proposed as well as for other points inside and outside the domain of
method, we compare the exact solution with the computed equation. The obtained absolute errors for points inside and
solution inside and outside of the domain of consideration. outside the domain [0, 10] are tabulated in Table I. It is clear
Furthermore, the proposed method is successfully applied to that the solution is of higher accuracy compared to the solution
solve Problem 1 for a very large time interval. For all experi- obtained in [19], despite the fact that fewer training points are
ments, the RBF kernel is used, K (u, v) = exp(−(u − v)2 /σ 2 ), used. (Note that in [19], 100 equidistant points are used for
1364 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2012

TABLE II
N UMERICAL R ESULTS OF THE P ROPOSED M ETHOD FOR S OLVING P ROBLEMS 5–7

Problem Domain Variable  y − ŷ∞ MSE STD


5 Inside y 1.14 × 10−6 4.16 × 10−13 6.43 × 10−7
y 4.81 × 10−5 6.78 × 10−11 8.21 × 10−6
Outside y 4.20 × 10−2 2.64 × 10−4 1.26 × 10−2
y 1.00 × 10−1 3.27 × 10−3 3.87 × 10−2
6 Inside y 5.88 × 10−6 1.49 × 10−11 1.63 × 10−6
y 7.34 × 10−6 2.18 × 10−11 3.28 × 10−6
Outside y 3.96 × 10−4 2.39 × 10−8 1.19 × 10−4
y 5.15 × 10−4 7.11 × 10−8 1.74 × 10−4
7 Inside y 6.64 × 10−9 2.01 × 10−16 4.07 × 10−9
y 8.41 × 10−8 1.30 × 10−15 3.59 × 10−8
Outside y 6.51 × 10−2 3.90 × 10−4 1.65 × 10−2
y 7.80 × 10−2 7.31 × 10−4 2.15 × 10−2
Note: STD is the standard deviation.

Fig. 1. Numerical results for Problem 1. (a) Ten equidistant points in [0, 10] are used for training. (b) 25 equidistant points in [0, 10] are used for training.
(c) Nonuniform partitions of [0, 10] using ten points, which are used for training. (d) Obtained absolute errors on the interval [0, 12] when [0, 10] is discretized
into nine equal parts. (e) Obtained absolute errors on the interval [0, 12] when [0, 10] is discretized into 24 equal parts. (f) Obtained absolute errors on the
interval [0, 12] when [0, 10] is discretized into nine nonuniform parts.

Fig. 2. (a) Numerical results for Problem 2. Twenty equidistant points in [0, 10] are used for training. (b) Numerical results for Problem 3. Twenty equidistant
points in [0, 0.5] are used for training. (c) Numerical results for Problem 4. Ten equidistant points in [0, 1] are used for training.

training and the maximum absolute error shown in [19, Fig. Twenty equidistant points in the given interval are used for
13] is approximately 25 × 10−2 .) the training process. The obtained approximate solution by
Problem 3: Consider the following nonlinear first-order the proposed method and the solution obtained by M ATLAB
ODE, which has no analytic solution [19, eq. 6]: built-in solver ODE45 are displayed in Fig. 2(b). The obtained
absolute errors for points inside and outside the domain [0, 0.5]
d
y(t) = y(t)2 + t 2 , y(0) = 1, t ∈ [0, 0.5]. are tabulated in Table I. The proposed method shows a better
dt
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1365

Fig. 3. (a) Numerical results for Problem 5. Ten equidistant points in [0, 1] are used for training. (b) Numerical results for Problem 6. Ten equidistant points
in [0, 2] are used for training. (c) Numerical results for Problem 7. Ten equidistant points in [0, 1] are used for training.

TABLE III
performance in comparison with the described method in [19]
N UMERICAL R ESULT OF THE P ROPOSED M ETHOD FOR S OLVING
in terms of accuracy, despite the fact that much less number
P ROBLEM 1 W ITH T IME I NTERVAL [0, 105 ]. N I S THE N UMBER OF L OCAL
of training points are used. (Note that in [19], the problem is
C OLLOCATION P OINTS , AND S I S THE N UMBER S UB -D OMAINS
solved over domain [0, 0.2] by using 100 equidistant training
points and the maximum absolute error shown in [19, Fig. 22] MSE
is approximately 4 × 10−2 .) N S CPU time
Training Test
Problem 4: Consider the following first-order ODE with 5.5 2.4 × 10−2 7.2 × 10−2
20 1000
time varying coefficient [18, eq. 1]: 10.6 1.3 × 10−3 3.3 × 10−3
2000
 8.4 × 10−8 2.3 × 10−7
d 1 + 3t 2 1 + 3t 2 5000 29.5
y(t) + t + y(t) = t 3 + 2t + t 2 6.6 2.2 × 10−2 5.9 × 10−2
dt 1+t +t 3 1 + t + t3 30 1000
2000 13.4 4.1 × 10−6 1.3 × 10−5
y(0) = 1, t ∈ [0, 1].
5000 37.1 8.2 × 10−9 2.7 × 10−8
In order to have a fair comparison with the results reported 40 1000 9.6 5.8 × 10−4 1.4 × 10−3
in [18], ten equidistant points in the given interval are used 2000 20.1 1.7 × 10−7 5.8 × 10−7
for the training process. The analytic solution and obtained 5000 54.2 2.3 × 10−9 8.1 × 10−9
solution via our proposed method are displayed in Fig. 2(c). Note: The execution time is in seconds.
The obtained absolute errors for points inside and outside
the domain [0, 1] are recorded in Table I, which shows the
superiority of the proposed method over the described method Ten equidistant points in the interval [0, 2] are used for
in [18]. (Note that in [18, Fig. 2], the maximum absolute error the training process. The analytic solution and the obtained
outside the domain [0, 1] is approximately 12 × 10−2 .) solution by the proposed method are shown in Fig. 3(b).
The obtained absolute errors for points inside and outside the
domain [0, 2] are tabulated in Table II, which again shows
B. Second-Order ODEs
the improvement of the proposed method over the described
Problem 5: Consider the following second-order BVP with method in [18]. (Note that in [18, Fig. 4], the maximum
time-varying input signal [19, eq. 4]: absolute error outside the domain [0, 2] is 8 × 10−4 .)
Problem 7: Consider the following singular second-order
d2
y(t) + y(t) = 2 + 2 sin(4t) cos(3t) ODE, which has no analytical closed-form solution
dt 2 [30, eq. 1]:
y(0) = 1, y(1) = 0.
d2 1 d 1
Ten equidistant points in the given interval are used for y(t) + y(t) − cos(t) = 0, y(0) = 0, y  (0) = 1.
dt 2 t dt t
the training process. The analytic solution and the obtained t sin(x)
solution via our proposed method are displayed in Fig. 3(a). Exact solution y(t) = d x.
0 x
The obtained absolute errors for points inside and outside
the domain [0, 1] are recorded in Table II, which shows Ten equidistant points in the interval [0, 1] are used as training
the superiority of the proposed method over the described points, and the obtained results are shown in Fig. 3(c) and
method in [19]. (Note that in [19], 100 equidistant points are recorded in Table II. The obtained maximum absolute error
used for training and the maximum absolute error shown in outside the domain [0, 1] is 6.51 × 10−2 , which is smaller
[19, Fig. 17] is approximately 5 × 10−1 .) than 14 × 10−1 shown in [30, Fig. 6].
Problem 6: Consider the following second-order ODE with
time-varying input signal [18, Problem 3]:
C. Sensitivity of the Solution w.r.t the Parameter
d2 1 d 1 −t In order to illustrate the sensitivity of the result with respect
y(t) + y(t) + y(t) = − e( 5 ) cos(t)
dt 2 5 dt 5 to the parameter of the model (σ ), for two examples, we
y(0) = 1, y  (0) = 1. have plotted the MSE, on the validation set, versus the kernel
1366 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2012

TABLE IV
N UMERICAL R ESULTS OF THE P ROPOSED M ETHOD FOR S OLVING
P ROBLEM 1 W ITH T IME I NTERVAL [0, 4000], W HILE T OTAL N UMBER OF
C OLLOCATION P OINTS i.e., N × S IS C ONSTANT

MSE
N CPU time
S
Training Test
800 10 85.5 1.36 × 10−8 2.06 × 10−8
400 20 26.1 1.37 × 10−8 2.08 × 10−8
Fig. 4. Sensitivity of the obtained result with respect to model parameter σ . 20 400 2.06 1.68 × 10−8 2.52 × 10−8
log10 (MSE) versus log10 σ is plotted for Problems 2 and 6. Note: The execution time is in seconds.

In Table IV, we analyze the situation where the total number


of collocation points i.e., N × S in the given interval [0, 4000]
is fixed. It can be seen that as the number of sub-domains
increases (number of collocation points in each sub-domain
increases), the computational time decreases without losing
the order of accuracy. In this case, the test set consists of
M = 2 × 104 points.

Fig. 5. (a) Residual y(t) − ŷ(t) when Problem 1 is solved on the


interval [0, 105 ], by using 5000 sub-intervals and 50 local collocation points. VII. C ONCLUSION
(b) Obtained residual y(t) − ŷ(t) for the same problem by using 500
sub-intervals and 500 local collocation points. In this paper, a new method based on LS-SVMs was
developed for solving general linear m-th order ODEs and
also first-order nonlinear ODEs. On the tested problems, the
method proposed in this paper is more efficient compared to
bandwidth on logarithmic scales in Fig. 4. From this figure, it methods described in [18] and [19]. Also, the proposed method
is apparent that there exists a range of σ for which the MSE is able to solve a differential equation for a large time interval,
on the validation set is quite small. while predicting the solution with high accuracy.
In the future, the proposed method may be extended to solve
D. Large Interval a system of ODEs and partial differential equations.
Let us consider Problem 1 when the time interval is [0, 105 ].
It is known in advance that the solution of this problem is
oscillating. The problem is solved by decomposing the given R EFERENCES
domain of interest into S sub-domains. Then the problem [1] J. Cheng, M. R. Sayeh, M. R. Zargham, and Q. Cheng, “Real-time vector
is solved on each sub-domain using N number of local quantization and clustering based on ordinary differential equations,”
IEEE Trans. Neural Netw., vol. 22, no. 12, pp. 2143–2148, Dec.
collocation points. The execution time and the MSE for the 2011.
training and test sets [2] C. Y. Lai, C. Xiang, and T. H. Lee, “Data-based identification and control
N×S of nonlinear systems via piecewise affine approximation,” IEEE Trans.
i=1 (y(ti ) − ŷ(ti ))
2 Neural Netw., vol. 22, no. 12, pp. 2189–2200, Dec. 2011.
MSEtrain =
N ×S [3] J. C. Butcher, Numerical Methods for Ordinary Differential Equations,
M 2nd ed. Chichester, U.K.: Wiley, 2008.
i=1 (y(t i ) − ŷ(ti ))
2
[4] J. D. Lambert, Numerical Methods for Ordinary Differential Systems.
MSEtest = New York: Wiley, 1991.
M
[5] M. M. Chawla and C. P. Katti, “Finite difference methods for two-point
where N × S is the total number of collocation points, and M boundary value problems involving high order differential equations,”
is the total number of test points over the interval [0, 105 ], are BIT Numer. Math., vol. 19, no. 1, pp. 27–33, 1979.
tabulated in Table III. The test set is the same for all the cases [6] R. D. Russell and L. F. Shampine, “A collocation method for boundary
value problems,” Numer. Math., vol. 19, no. 1, pp. 1–28, 1972.
and it consists of M = 5 × 105 points. It is apparent that when [7] C. W. Gear, “Hybrid methods for initial value problems in ordinary
S is fixed and N increases, the accuracy is improved whereas differential equations,” SIAM J. Numer. Anal., vol. 2, no. 1, pp. 69–86,
the execution time is increased. The same pattern is observed 1965.
when N is fixed and S increases. Fig. 5(a) and (b) shows the [8] D. Sarafyan, “New algorithm for the continuous approximate solution
of ordinary differential equations and the upgrading of the order of the
residual error et = y(t) − ŷ(t) when Problem 1 is solved over processes,” Comput. Math. Appl., vol. 20, no. 1, pp. 71–100, 1990.
the interval [0, 105 ], using N = 50 local collocation points, [9] S. N. Jator and J. Li, “A self-starting linear multistep method for a
S = 5000 sub-domains and N = 500, collocation points, direct solution of the general second-order initial value problem,” Int. J.
Comput. Math., vol. 86, no. 5, pp. 827–836, 2009.
S = 500 sub-domains, respectively. It should be noted that
[10] D. O. Awoyemi, “A new sixth-order algorithm for general second order
the result depicted in Fig. 5(a) is obtained much faster than ordinary differential equation,” Int. J. Comput. Math., vol. 77, no. 1, pp.
that shown in Fig. 5(b). 117–124, 2001.
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1367

[11] A. J. Meade and A. A. Fernadez, “The numerical solution of linear Siamak Mehrkanoon received the B.S. degree in
ordinary differential equations by feedforward neural networks,” Math. pure mathematics in 2005 and the M.S. degree in
Comput. Model., vol. 19, no. 12, pp. 1–25, 1994. applied mathematics from the Iran University of
[12] H. Lee and I. Kang, “Neural algorithms for solving differential equa- Science and Technology, Tehran, Iran, in 2007. He
tions,” J. Comput. Phys., vol. 91, no. 1, pp. 110–117, 1990. is currently pursuing the Ph.D. degree with the
[13] B. P. van Milligen, V. Tribaldos, and J. A. Jiménez, “Neural network Department of Electrical Engineering, Katholieke
differential equation and plasma equilibrium solver,” Phys. Rev. Lett., Universiteit Leuven, Leuven, Belgium.
vol. 75, no. 20, pp. 3594–3597, 1995. His current research interests include machine
learning, system identification, pattern recognition,
[14] L. P. Aarts and P. Van der Veer, Solving Nonlinear Differential Equations
and numerical algorithms.
by a Neural Network Method (Lecture Notes in Computer Science),
vol. 2074. New York: Springer-Verlag, 2001, pp. 181–189.
[15] P. Ramuhalli, L. Udpa, and S. S. Udpa, “Finite-element neural networks
for solving differential equations,” IEEE Trans. Neural Netw., vol. 16,
no. 6, pp. 1381–1392, Nov. 2005. Tillmann Falck received the Dipl.Ing. degree
[16] K. S. McFall and J. R. Mahan, “Artificial neural network method for in electrical engineering from Ruhr University
solution of boundary value problems with exact satisfaction of arbitrary Bochum, Bochum, Germany, in 2007. He is pursu-
boundary conditions,” IEEE Trans. Neural Netw., vol. 20, no. 8, pp. ing the Ph.D. degree in electrical engineering from
1221–1233, Aug. 2009. Katholieke Universiteit Leuven, Leuven, Belgium.
[17] I. G. Tsoulos, D. Gavrilis, and E. Glavas, “Solving differential equations He has been with Robert Bosch GmbH, Wuert-
with constructed neural networks,” Neurocomputing, vol. 72, nos. 10–12, temberg, Germany, since 2012. His research interests
pp. 2385–2391, 2009. include nonlinear system identification, convex opti-
[18] I. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial neural networks for mization, machine learning, and tracking and driver
solving ordinary and partial differential equations,” IEEE Trans. Neural assistance systems.
Netw., vol. 9, no. 5, pp. 987–1000, Sep. 1998.
[19] H. S. Yazdi, M. Pakdaman, and H. Modaghegh, “Unsupervised ker-
nel least mean square algorithm for solving ordinary differential
equations,” Neurocomputing, vol. 74, nos. 12–13, pp. 2062–2071,
2011. Johan A. K. Suykens (SM’05) was born in Wille-
[20] B. Schölkopf and A. Smola, Learning with Kernels. Cambridge, MA: broek, Belgium, on May 18, 1966. He received
MIT Press, 2002. the M.S. degree in electromechanical engineering
and the Ph.D. degree in applied sciences from the
[21] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998. Katholieke Universiteit Leuven, Leuven, Belgium, in
[22] J. A. K. Suykens and J. Vandewalle, “Least squares support vector 1989 and 1995, respectively.
machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, He was a Visiting Post-Doctoral Researcher with
1999. the University of California, Berkeley, in 1996. He
[23] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. has been a Post-Doctoral Researcher with the Fund
Vandewalle, Least Squares Support Vector Machines. Singapore: World for Scientific Research FWO Flanders and is cur-
Scientific, 2002. rently a Professor in Hoogleraar with K.U. Leuven.
[24] J. A. K. Suykens, J. Vandewalle, and B. De Moor, “Optimal control by He has authored the books Artificial Neural Networks for Modelling and
least squares support vector machines,” Neural Netw., vol. 14, no. 1, pp. Control of Non-linear Systems (Kluwer Academic Publishers) and Least
23–35, 2001. Squares Support Vector Machines (World Scientific), co-authored the book
[25] K. De Brabanter, J. De Brabanter, J. A. K. Suykens, and B. De Cellular Neural Networks, Multi-Scroll Chaos and Synchronization (World
Moor, “Approximate confidence and prediction intervals for least squares Scientific), and edited the books Nonlinear Modeling: Advanced Black-Box
support vector regression,” IEEE Trans. Neural Netw., vol. 22, no. 1, pp. Techniques (Kluwer Academic Publishers) and Advances in Learning Theory:
110–120, Jan. 2011. Methods, Models and Applications (IOS Press). In 1998, he organized an
[26] H. Ning, X. Jing, and L. Cheng, “Online identification of nonlinear International Workshop on Nonlinear Modeling with Time-series Prediction
spatiotemporal systems using kernel learning approach,” IEEE Trans. Competition.
Neural Netw., vol. 22, no. 9, pp. 1381–1394, Sep. 2011. Dr. Suykens has served as an Associate Editor for the IEEE T RANSAC -
TIONS ON C IRCUITS AND S YSTEMS from 1997 to 1999 and from 2004 to
[27] M. Lázaro, I. Santamaria, F. Pérez-Cruz, and A. Artés-Rodriguez, “Sup-
2007, and for the IEEE T RANSACTIONS ON N EURAL N ETWORKS from 1998
port vector regression for the simultaneous learning of a multivariate
to 2009. He was a recipient of the IEEE Signal Processing Society Best
function and its derivative,” Neurocomputing, vol. 69, nos. 1–3, pp.
Paper (Senior) Award in 1999 and several Best Paper Awards at International
42–61, 2005.
Conferences. He was a recipient of the International Neural Networks Society
[28] D. R. Kincaid and E. W. Cheney, Numerical Analysis: Mathematics INNS 2000 Young Investigator Award for significant contributions in the field
of Scientific Computing, 3rd ed. Pacific Grove, CA: Brooks/Cole, of neural networks. He has served as a Director and Organizer of the NATO
2002. Advanced Study Institute on Learning Theory and Practice (Leuven, 2002),
[29] A. Toselli and O.B. Widlund, Domain Decomposition as a Program Co-Chair for the International Joint Conference on Neural
Methods-Algorithms and Theory. Berlin, Germany: Springer-Verlag, Networks in 2004 and the International Symposium on Nonlinear Theory
2005. and its Applications 2005, as an organizer of the International Symposium
[30] I. G. Tsoulos and I. E. Lagaris, “Solving differential equations with on Synchronization in Complex Networks in 2007 and a co-organizer of the
genetic programming,” Genetic Program. Evolvable Mach., vol. 7, no. 1, NIPS 2010 workshop on Tensors, Kernels and Machine Learning. He was
pp. 33–54, 2006. awarded an ERC Advanced Grant in 2011.

You might also like