Professional Documents
Culture Documents
Mehrkanoon 2012
Mehrkanoon 2012
Mehrkanoon 2012
9, SEPTEMBER 2012
Abstract— In this paper, a new approach based on least squares Many methods have been developed for solving
support vector machines (LS-SVMs) is proposed for solving IVPs of ODEs, such as Runge–Kutta, finite difference,
linear and nonlinear ordinary differential equations (ODEs). predictor-corrector, and collocation methods [3]–[6]. Generally
The approximate solution is presented in closed form by means
of LS-SVMs, whose parameters are adjusted to minimize an speaking, numerical methods for approximating the solution
appropriate error function. For the linear and nonlinear cases, of the BVPs fall into two classes: the difference methods (e.g.,
these parameters are obtained by solving a system of linear shooting method) and weighted residual or series methods.
and nonlinear equations, respectively. The method is well suited In the shooting method, one tries to reduce the problem to
to solving mildly stiff, nonstiff, and singular ODEs with initial IVPs by providing a sufficiently good approximation of the
and boundary conditions. Numerical results demonstrate the
efficiency of the proposed method over existing methods. derivative values at the initial point.
Concerning higher order ODEs, the most common approach
Index Terms— Closed-form approximate solution, collocation is the reduction of the problem to a system of first-order
method, least squares support vector machines (LS-SVMs),
ordinary differential equations (ODEs). differential equations and then solve the system by employing
one of the available methods, which notably has been studied
in the literature, see [4], [7], and [8]. However, as some authors
I. I NTRODUCTION
have remarked, this approach wastes a lot of computer time
existence of many local minima solutions. The second problem A differential equation (1) is said to be stiff when its
is how to choose the number of hidden units. exact solution consists of a steady state term that does not
Support vector machines (SVMs) are a powerful method grow significantly with time, together with a transient term
for solving pattern recognition and function estimation prob- that decays exponentially to zero. Problems involving rapidly
lems [20], [21]. In this method, one maps data into a decaying transient solutions occur naturally in a wide variety
high-dimensional feature space and there solves a linear of applications, including the study of damped mass spring
regression problem. It leads to solving quadratic programming system and the analysis of control systems (see [4] for more
problems. least squares (LS)-SVMs for function estimation, details).
classification, problems in unsupervised learning and others If the coefficient functions f (t) of (1) fail to be analytic at
has been investigated in [22]–[26]. In this case, the problem point x = a, then (1) is called singular ODE.
formulation involves equality instead of inequality constraints. The approaches given in [18] and [19], define a trial
The training for regression and classification problems is then solution to be a sum of two terms i.e., y(t) = H (t) +
done by solving a set of linear equations. It is the purpose of F(t, N(t, P)). The first term H (t), which has to be defined by
this paper to introduce a new approach based on LS-SVMs the user and in some cases is not straightforward, satisfies the
for solving ODEs. initial/boundary conditions, and the second term F(t, N(t, P))
This paper uses the following notation. Vector-valued vari- is a single-output feedforward neural network with input t
ables are denoted in lowercase boldface, whereas variables and parameters P. In contrast with the approaches given
that are neither boldfaced nor capitalized are scalar valued. in [18] and [19], we build the model by incorporating the
Matrices are denoted in capital. Euler Script (euscript) font is initial/boundary conditions as constraints of an optimization
used for operators. problem. This significantly reduces the burden placed on the
This paper is organized as follows. In Section II, the user as a potentially difficult problem is handled automatically
problem statement is given. In Section III, we formulate by the proposed technique.
our LS-SVMs method for the solution of linear differential
equations. Section IV is devoted to the formulation of the
method for nonlinear first-order ODEs. Model selection and A. LS-SVM Regression
the practical implementation of the proposed method are Let us consider a given training set {x i , yi }i=1
N
with input
discussed in Section V. Section VI describes the numerical data x i ∈ R and output data yi ∈ R. For the purpose of
experiments, discussion, and comparison with other known this paper, we only use an 1-D input space. The goal in a
methods. regression problem is to estimate a model of the form y(x) =
w T ϕ(x) + b.
II. P ROBLEM S TATEMENT The primal LS-SVM model for regression can be written as
This section describes the problem statement. In follows [23]:
Section II-A, a short introduction to LS-SVMs for regression 1 T γ
is given to highlight the difference to the problem considered minimize w w + eT e
w,b,e 2 2
in this paper. Finally, some operators that will be used in the
s.t. yi = w ϕ(x i ) + b + ei , i = 1, . . . , N
T
(2)
following sections are defined.
Consider the general m-th order linear ODE with time where γ ∈ R+ , b ∈ R, w ∈ Rh . ϕ(·) : R → Rh is the feature
varying coefficients of the form map and h is the dimension of the feature space. The dual
m solution is then given by
L[y] ≡ f (t)y () (t) = r (t) t ∈ [a, c] (1) ⎡ ⎤
+ I N /γ 1 N
⎦ α = y
=0
⎣
where L represents an m-th order linear differential operator, 1N T 0 b 0
[a, c] is the problem domain, and r (t) is the input signal. f (t)
are known functions and y ()(t) denotes the -th derivative of where i j = K (x i , x j ) = ϕ(x i )T ϕ(x j ) is the i j -th entry
y with respect to t. The m − 1 necessary initial or boundary of the positive definite kernel matrix. 1 N = [1, . . . , 1]T ∈
conditions for solving the above differential equations are R N , α = [α1 , . . . , α N ]T , y = [y1 , . . . , y N ]T and I N is
IVP the identity
N matrix. The model in the dual form becomes:
ICμ [y(t)] = pμ , μ = 0, . . . , m − 1 y(x) = i=1 αi K (x, x i ) + b. It should be noted that if b = 0,
for an explicitly known and finite dimensional feature map ϕ
BVP
the problem could be solved in primal (ridge regression) by
BCμ [y(t)] = qμ , μ = 0, . . . , m − 1
eliminating e and then w would be the only unknown. But
where ICμ are the initial conditions (all constraints are applied in the LS-SVM approach, the feature map ϕ is not explicitly
at the same value of the independent variable i.e., t = a) and known in general and can be infinite dimensional. Therefore,
BCμ are the boundary conditions (the constraints are applied the kernel trick is used and the problem is solved in dual
at multiple values of the independent variable t, typically at [22]. When we deal with differential equations, the target
the ends of the interval [a, c] in which the solution is sought). values yi are not directly available, so the regression approach
pμ and qμ are given scalars. does not directly apply. Nevertheless, we can incorporate the
1358 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2012
underlying differential equation in the learning process to find For notational convenience, let us list the following notations,
an approximation for the solution. which are used in the following sections:
Let us assume an explicit model ŷ(t) = w T ϕ(t) + b as ⏐
⏐
an approximation for the solution of the differential equation. [∇n K ](t, s) = [∇n K (u, v)]⏐
m m
⏐
Since there are no data available in order to learn from the ⏐
u=t,v=s
⏐
differential equation, we have to substitute our model into ⏐ ∂ n+m K (u, v) ⏐
[m
n ]i, j = ∇nm [K (u, v)]⏐
⏐ = ⏐
the given differential equation. Therefore, we need to define
u=ti ,v=t j ∂u n ∂v m ⏐u=ti ,v=t j
the derivative of the kernel function. Making use of Mercer’s ⏐
⏐
theorem [21], derivatives of the feature map can be written [00 ]i, j = ∇00 [K (u, v)]⏐
⏐ = K (ti , t j )
in terms of derivatives of the kernel function [27]. Let us u=ti ,v=t j
define the following differential operator, which will be used
where [m n ]i, j denotes the (i, j )-th entry of matrix n . The
m
in subsequent sections:
notation Mk:l,m:n is used for selecting a submatrix of matrix
∂ n+m M consisting of rows k to l and columns m to n. Mi,: denotes
∇nm ≡ . (3)
∂u n ∂v m the i -th row of matrix M. M:, j denotes the j -th column of
matrix M.
If ϕ(u)T ϕ(v) = K (u, v), then one can show that
[ϕ (n) (u)]T ϕ (m) (v) = ∇nm [ϕ(u)T ϕ(v)] A. First-Order IVP
∂ n+m K (u, v) As a first example, consider the following first-order IVP:
= ∇nm [K (u, v)] = . (4)
∂u n ∂v m
y (t) − f 1 (t)y(t) = r (t), y(a) = p1 , a ≤ t ≤ c. (7)
Using (4), it is possible to express all derivatives of the feature
map in terms of the kernel function itself (provided that the In the LS-SVM framework, the approximate solution can
kernel function is sufficiently differentiable). For instance, the be obtained by solving the following optimization problem:
following relations hold:
1 T γ
minimize w w + eT e
∂(ϕ(u)T ϕ(v)) w,b,e 2 2
∇10 [K (u, v)] = = ϕ (1) (u)T ϕ(v)
∂u T
s.t. w ϕ (ti ) = f1 (ti ) w ϕ(ti ) + b
T
∂(ϕ(u)T ϕ(v))
∇01 [K (u, v)] = = ϕ(u)T ϕ (1) (v)
∂v r (ti ) + ei , i = 2, . . . , N
∂ 2 (ϕ(u)T ϕ(v)) w T ϕ(t1 ) + b = p1 .
∇20 [K (u, v)] = = ϕ (2) (u)T ϕ(v). (8)
∂u 2
This problem is obtained by combining the LS-SVM cost func-
III. F ORMULATION OF THE M ETHOD FOR THE tion with constraints constructed by imposing the approximate
L INEAR ODE C ASE solution ŷ(t) = w T ϕ(t) + b, given by the LS-SVM model,
to satisfy the given differential equation with corresponding
Let us assume that a general approximate solution to (1) is
initial condition at collocation points {ti }i=1
N . Problem (8) is
of the form ŷ(t) = w T ϕ(t) + b, where w and b are unknowns
a quadratic minimization under linear equality constraints,
of the model that have to be determined. To obtain the optimal
which enables an efficient solution.
value of these parameters, collocation methods can be used
Lemma 3.1: Given a positive definite kernel function
[28], which assume a discretization of the interval [a, c] into
K : R× R → R with K (t, s) = ϕ(t)T ϕ(s) and a regular-
a set of collocation points ϒ = {a = t1 < t2 < · · · < t N = c}.
ization constant γ ∈ R+ , the solution to (8) is obtained by
Therefore, the w and b are to be found by solving the
solving the following dual problem:
following optimization problem.
⎡ ⎤
For the IVP Case: K + I N−1 /γ h p1 − f1 ⎡ α ⎤ ⎡ r ⎤
N 2 ⎢ ⎥
1 ⎢
⎣ h p1 T 1 1 ⎥ ⎣ ⎦ ⎣ ⎦
⎦ β = p1 (9)
minimize (L[ ŷ] − r )(ti )
2 b 0
ŷ
i=1 − f1 T 1 0
s.t. ICμ [ ŷ(t)] = pμ , μ = 0, . . . , m − 1. (5)
with
For the BVP Case:
α = [α2 , . . . , α N ]T , f1 = [ f 1 (t2 ), . . . , f 1 (t N )]T ∈ R N−1
N
2
1 r = [r (t2 ), . . . , r (t N )]T ∈ R N−1
minimize (L[ ŷ] − r )(ti )
ŷ 2
i=1 K= ˜ 11 − D1 ˜ 01 − ˜ 00 D1
˜ 10 D1 + D1
s.t. BCμ [ ŷ(t)] = qμ , μ = 0, . . . , m − 1 (6) h p1 = [10 ]1,2:N
T
− D1 [00 ]1,2:N
T
.
where N is the number of collocation points (which is equal D1 is a diagonal matrix with the elements of f1 on the main
to the number of training points) used to undertake the learn- diagonal. [m m m ˜m
n ]1,2:N = [[n ]1,2 , . . . , [n ]1,N ] and n =
ing process. In what follows we formulate the optimization [n ]2:N,2:N for n, m = 0, 1. Also note that K ∈ R
m (N−1)×(N−1)
problem in the LS-SVM framework for solving linear ODEs. and h p1 ∈ R N−1 .
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1359
Proof: The Lagrangian of the constrained optimization The approximate solution, ŷ(t) = w T ϕ(t)+b, is then obtained
problem (8) becomes by solving the following optimization problem:
1 γ 1 T γ
L(w, b, ei , αi , β) = w T w + eT e minimize w w + eT e
2 2 w,b,e 2 2
N
s.t. w T ϕ (ti ) = f1 (ti )w T ϕ (ti ) +
− αi w ϕ (ti ) − f 1 (ti )ϕ(ti )
T
∂L
N = 0 → w T ϕ i − f 1 (ti )ϕ i − f 2 (ti )ϕ i 2) BVP Case: Consider the second-order BVP of ODEs of
∂αi the form
− f 2 (ti )b − ei = ri ,
∂L y (t) = f 1 (t)y (t) + f 2 (t)y(t) + r (t) t ∈ [a, c]
i = 2, . . . , N = 0 → w T ϕ 1 + b = p1
∂β1 y(a) = p1 , y(c) = q1 .
∂L
= 0 → w T ϕ 1 = p2 Then the parameters of the closed-form approximation of the
∂β2
solution can be obtained by solving the following optimization
where ϕ i = ϕ(ti ), ϕ i = ϕ (ti ) and ϕ i = ϕ (ti ) for i = problem:
1, . . . , N. 1 T γ
Applying the kernel trick and eliminating w and {ei }i=2 N minimize w w + eT e
w,b,e 2 2
leads to s.t. w T ϕ (ti ) = f 1 (ti )w T ϕ (ti )
⎧
⎪ N + f 2 (ti )[w T ϕ(ti ) + b] + r (ti ) + ei
⎪
⎪
⎪
⎪ r = α [22 ] j,i i = 2, . . . , N − 1
⎪
⎪
i j
⎪
⎪ j =2
⎪
⎪ w T ϕ(t1 ) + b = p1
⎪
⎪
⎪
⎪ − f (t ) [ 1] − f (t )[ 1] − f (t )[ 1]
w T ϕ(t N ) + b = q1 . (14)
⎪
⎪
1 i
2 j,i 1 j 1 j,i 2 j 0 j,i
⎪
⎪
⎪
⎪
⎪
⎨ − f (t
2 i ) [ 0]
2 j,i − f (t
1 j )[ 0]
1 j,i − f (t
2 j )[ 0]
0 j,i
The same procedure can be applied to derive the Lagrangian
and afterward the KKT optimality conditions. Then, one can
⎪
⎪ − f 1 (t j )[21 ] j,i − f 2 (t j )[20 ] j,i show that the solution to (14) is obtained by solving the
⎪
⎪
⎪
⎪
following linear system:
⎪
⎪
⎪
⎪ +β [ 2] − (t )[ 1] − (t )[ 0]
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎪
⎪ 1 0 1,i f 1 i 0 1,i f 2 i 0 1,i
⎪
⎪
K + I N−2 /γ h p1 h q1 − f2 α r
⎪
⎪ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎪
⎪ +β [ 2] − (t )[ 1] − (t )[ 0] ⎢
⎪
⎪
⎪
2 1 1,i f 1 i 1 1,i f 2 i 1 1,i ⎢ h p1 T 1 [00 ] N,1 1 ⎥ ⎢ ⎥ ⎢ ⎥
⎥ ⎢ β1 ⎥ ⎢ p 1 ⎥
⎪
⎩ αi ⎢ ⎥⎢ ⎥=⎢ ⎥
+ γ − f 2 (ti )b, i = 2, . . . , N ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ h q1 T [00 ]1,N 1 1 ⎥ ⎢ β2 ⎥ ⎢ q 1 ⎥
⎧
⎣ ⎦⎣ ⎦ ⎣ ⎦
⎪
⎪ N
− f2 T
⎪
⎪ = α [ 0
] − (t )[ 0
] − (t )[ 0
]
1 1 0 b 0
⎪
⎪ 1
p j 2 j,1 f 1 j 1 j,1 f 2 j 0 j,1
⎪
⎪
⎪
⎪ j =2
where
⎪
⎪ +β1 [00 ]1,1 + β2 [01 ]1,1 + b
⎪
⎪
⎪
⎪
⎨ N α = [α2 , . . . , α N−1 ]T
p2 = α j [2 ] j,1 − f 1 (t j )[1 ] j,1 − f 2 (t j )[0 ] j,1
1 1 1
⎪
⎪ f1 = [ f 1 (t2 ), . . . , f1 (t N−1 )]T ∈ R N−2
⎪
⎪ j =2
⎪
⎪ +β1 [10 ]1,1 + β2 [11 ]1,1 f2 = [ f 2 (t2 ), . . . , f2 (t N−1 )]T ∈ R N−2
⎪
⎪
⎪
⎪
⎪
⎪ N r = [r (t2 ), . . . , r (t N−1 )]T ∈ R N−2
⎪
⎪
⎪
⎪ 0 = α j f 2 (t j ) − β1 .
⎩ K= ˜ 12 − D2
˜ 22 − D1 ˜ 02 −
˜ 21 D1 −
˜ 20 D2
j =2
˜ 11 D1 + D1
+D1 ˜ 10 D2 + D2
˜ 01 D1 + D2
˜ 00 D2
Finally, writing these equations in matrix form will result in
h p1 = [20 ]1,2:N−1
T
− D1 [10 ]1,2:N−1
T
− D2 [00 ]1,2:N−1
T
the linear system (12).
The LS-SVM model for the solution and its derivative in h q1 = [20 ]TN,2:N−1 − D1 [10 ]TN,2:N−1 − D2 [00 ]TN,2:N−1 .
the dual form becomes
D1 and D2 are diagonal matrices with the elements
N
of f1 and f2 on the main diagonal, respectively. Note
ŷ(t) = αi [∇20 K ](ti , t) − f1 (ti )[∇10 K ](ti , t) that K ∈ R(N−2)×(N−2) and h p1 , h q1 ∈ R N−2 .
i=2 [mn ]1,2:N−1 = [[n ]1,2 , . . . , [n ]1,N−1 ] and [n ] N,2:N−1 =
m m m
[[n ] N,2 , . . . , [n ] N,N−1 ] for n = 0, 1 and m = 0, 1, 2.
m m
− f 2 (ti )[∇00 K ](ti , t) + β1 [∇00 K ](t1 , t) ˜m
n = [n ]2:N−1,2:N−1 for m, n = 0, 1, 2.
m
The LS-SVM model for the solution and its derivative are
+β2 [∇10 K ](t1 , t) + b expressed in dual form as
N
d ŷ(t)
N−1
= αi [∇21 K ](ti , t) − f 1 (ti )[∇11 K ](ti , t)
dt ŷ(t) = αi [∇20 K ](ti , t) − f 1 (ti )[∇10 K ](ti , t)
i=2
i=2
− f 2 (ti )[∇0 K ](ti , t) + β1 [∇01 K ](t1 , t)
1
− f 2 (ti )[∇00 K ](ti , t) + β1 [∇00 K ](t1 , t)
+β2 [∇11 K ](t1 , t). +β2 [∇00 K ](t N , t) + b
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1361
N−1
{Di }m
i=1 are diagonal matrices with the elements of { f i }i=1 on
m
d ŷ(t)
= αi [∇21 K ](ti , t) − f 1 (ti )[∇11 K ](ti , t) ¯ ¯
dt the main diagonal, respectively. Also 1 , , and D̄ are block
i=2
matrices. ˜m
n = [n ]2:N,2:N . Note that K ∈ R
m (N−1)×(N−1) .
− f 2 (ti )[∇01 K ](ti , t) + β1 [∇01 K ](t1 , t) Proof: The Lagrangian for (16) is given by
L(w, b, ei , αi , βi )
+β2 [∇01 K ](t N , t).
1 T γ N m
= w w + eT e − αi w T ϕ m (ti ) − f k (ti )
C. m-th Order Linear ODE 2 2
i=2 k=1
Let us now consider the general m-th order IVP of the (m−k)
ϕ (ti ) − f m (ti )b − ri − ei −β1 w ϕ(t1 )+b − p1
T
following form:
m
T
−β2 w ϕ (t1 ) − p2 − · · · − βm w ϕ T m−1
(t1 ) − pm .
y (m) (t) − f i (t)y (m−i) (t) = r (t) t ∈ [a, c]
i=1
Eliminating w and {ei }i=2 N
from the corresponding KKT opti-
y(a) = p1
(15) mality conditions yields the following set of equations:
y (i−1) (a) = pi , i = 2, . . . , m. ⎧
⎪ N
The approximate solution can be obtained by solving the ⎪
⎪
⎪
⎪ ri = α j [m m ] j,i
following optimization problem: ⎪
⎪
⎪
⎪ j =2
⎪
⎪
1 T γ ⎪
⎨ m m m−
minimize w w + eT e − =1 f (ti ) [m ] j,i − k=1 f k (t j )[m−k ] j,i
m−
w,b,e 2 2
⎪
⎪ − m (t j )[m
⎪ k=1 f k
m−k ] j,i
m
⎪
⎪
s.t. w T ϕ (m) (ti ) = w T f k (ti )ϕ (m−k) ⎪
⎪ m
i ⎪
⎪
m
+ =1 β [−1]1,i − k=1 f k (ti )[−1 ]1,i
m m−k
k=1 ⎪
⎪
⎪
⎩
+ f m (ti )b + r (ti ) + ei + αγi − f m (ti )b, i = 2, . . . , N
i = 2, . . . , N ⎧
⎪
⎪ N m
w ϕ(t1 ) + b = p1
T ⎪
⎪ p = α [ 0
] − f (t )[ 0
]
⎪
⎪ 1 j m j,1 k j m−k j,1
⎪
⎪
w T ϕ (i−1) (t1 ) = pi , i = 2, . . . , m. (16) ⎪
⎪
⎪
j =2
k=1
⎪
⎪ + m k=1 βk [k−1 ]1,1 + b
0
⎪
⎪ .
Lemma 3.3: Given a positive definite kernel function K : ⎪
⎪ ..
⎪
⎨
R × R → R with K (t, s) = ϕ(t)T ϕ(s) and a regularization
constant γ ∈ R+ , the solution to (16) is obtained by solving N m
⎪
⎪ p = α [ m−1
] − f (t )[ m−1
]
the following dual problem: ⎪
⎪
m j m j,1 k j m−k j,1
⎪
⎪ j =2
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎪
⎪ k=1
⎪ + m m−1
K + I N−1 /γ Kin − fm α r ⎪
⎪ k=1 βk [k−1 ]1,1
⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎪
⎪
⎢ ⎪
⎪ N
⎣ Kin
T C ⎥ ⎢ ⎥ ⎢ ⎥
⎦⎣β ⎦ = ⎣ p⎦ (17) ⎪
⎪
⎪ 0 = α j f m (t j ) − β1 .
⎩
− fm T CT 0 b 0 j =2
⎡ ⎤ ⎡ ⎤
Then the matrix K can be written as D̃ ˜ D̃ T . To show that α f ( y)
˜ x̃ ≥ 0 ⎢η⎥ ⎢
x Kx ≥ 0 for any x, it is sufficient to show that x̃ T ⎢ ⎥ ⎢ 0 N−1 ⎥
T
⎢ ⎥ ⎢ ⎥
for any x̃ because x = D̃ x̃ is included as a special case. Now ⎢ β ⎥ = ⎢ p1 ⎥ ⎥ (20)
⎢ ⎥ ⎣ ⎦
consider matrices of evaluations of the feature map and its ⎣b⎦ 0
derivatives
(i) = [ϕ (i) (t2 ), . . . , ϕ (i) (t N )] for i = 0, . . . , m−1 0 N−1
˜ = [
(0) , . . . ,
(m−1) ]. y
and denote their concatenation as
˜ ˜ ˜ ˜
Then
x̃ 2 = x̃
x̃ = x̃ x̃ holds, where the last
2 T T T
where
equality follows from an application of the kernel trick. With
the property that the norm of any vector is a nonnegative real 11 =
˜ 11 + I N−1 /γ ,
00 =
˜ 00 + I N−1 /γ
number,
˜ x̃ 2 is greater than or equal to zero. Therefore,
also its squared form x̃ T ˜ x̃ is nonnegative, which concludes D( y) = diag( f ( y))
the proof. f ( y) = [ f (t2 , y2 ), . . . , f (t N , y N )]T
⏐ ⏐
∂ f (t, y) ⏐⏐ ∂ f (t, y) ⏐
⏐
f ( y) = ,...,
IV. F ORMULATION OF THE M ETHOD FOR THE ∂y ⏐t =t2 ,y=y2 ∂y ⏐t =t N ,y=y N
N ONLINEAR ODE C ASE α = [α2 , . . . , α N ]T , η = [η2 , . . . , η N ]T
In this section, we formulate an optimization problem based ˜ 00 = [00 ]2:N,2:N
y = [y2 , . . . , y N ]T ,
on LS-SVMs for solving nonlinear first-order ODEs of the ˜ 11 = [11 ]2:N,2:N ,
˜ 10 = [10 ]2:N,2:N
following form:
h0 = [00 ]1,2 , . . . , [00 ]1,N
y = f (t, y), y(a) = p1 , a ≤ t ≤ c. (18)
h1 = [10 ]1,2 , . . . , [10 ]1,N
One starts with assuming the approximate solution to be
of the form ŷ(t) = w T ϕ(t) + b. Additional unknowns yi are 0 N−1 = [0, . . . , 0]T ∈ R N−1 .
introduced to keep the constraints linear in w. This yields the
following nonlinear optimization problem: The nonlinear system (20), which consists of 3N −1 equations
with 3N − 1 unknowns (α, η, β, b, y), is solved by Newton’s
1 T γ γ method. The model in the dual form becomes
minimize w w + eT e + ξ T ξ
w,b,e,ξ ,yi 2 2 2
N
N
s.t. w T ϕ (ti ) = f (ti , yi ) + ei , i = 2, . . . , N ŷ(t) = αi [∇10 K ](ti , t) + ηi [∇00 K ](ti , t)
T
w ϕ(t1 ) + b = p1 i=2 i=2
yi = w T ϕ(ti ) + b + ξi , i = 2, . . . , N. (19) +β [∇00 K ](t1 , t) + b. (21)
TABLE I
Algorithm 1 Approximating the Solution on a Large Interval
N UMERICAL R ESULTS OF THE P ROPOSED M ETHOD FOR
1: Decompose the domain
= [a, c] into S sub-domains.
S OLVING P ROBLEMS 2–4
2: set = (c − a)/S, tin := a, yin := p1 , t f := tin + .
3: for k = 1 to S do Problem Domain y − ŷ∞ MSE STD
4: Obtain a LS-SVM model for the k-th segment [tin , t f ] 2 Inside 4.56 × 10−3 1.47 × 10−6 1.16 × 10−3
i.e., ŷk (t) = wkT ϕ k (t) + bk . Outside 4.62 × 10−1 3.85 × 10−2 1.56 × 10−1
5: set tin := t f , yin := ŷ(t f ), t f := tin + 3 Inside 5.43 × 10−3 8.94 × 10−6 1.60 × 10−3
6: end for Outside 8.46 × 10−2 1.49 × 10−3 2.27 × 10−2
7: For a given test point t: 4 Inside 1.46 × 10−4 8.15 × 10−9 3.90 × 10−5
1) check to which segment it belongs; Outside 6.76 × 10−2 5.53 × 10−4 2.20 × 10−2
2) use the corresponding model to compute the approx- Note: STD is the standard deviation.
imate solution at the given point.
so the following relations hold:
2(u − v)
∇10 [K (u, v)] = − K (u, v)
B. Parameter Tuning σ2
2(u − v)
The performance of the LS-SVM model depends on the ∇01 [K (u, v)] = K (u, v)
σ
2
choice of the tuning parameters. In this paper, for all experi-
ments, the Gaussian RBF kernel is used. Therefore, a model is 4(u − v)2 2
∇2 [K (u, v)] =
0
− 2 K (u, v).
determined by the regularization parameter γ and the kernel σ4 σ
bandwidth σ . MATLAB 2010b is used to implement the code and all compu-
It should be noted that unlike the regression case, we do tations were carried out on a windows 7 system with Intel-core
not have target values, and consequently we do not have i7 CPU and 4.00 GB RAM.
noise. Therefore, a quite large value should be taken for
the regularization constant γ so that the error e is sharply
A. First-Order ODEs
minimized or equivalently the constraints are well satisfied.
In all the experiments, the chosen value for γ was 1010 , Problem 1: Consider the following first-order ODE
except for the problem with large interval for which γ is set [19, eq. 2]:
to 105 in order to avoid ill conditioning. d
y(t) + 2y(t) = sin(t) y(0) = 1, t ∈ [0, 10].
Therefore, the only parameter left that has to be tuned is dt
the kernel bandwidth. In this paper, the optimal values of σ The approximate solution obtained by the proposed method
are obtained by evaluating the performance of the model on is compared with the true solution and results are depicted
a validation set using a meaningful range of possible (σ ). in Fig. 1. From the obtained results, it is apparent that our
The validation set is defined to be the set of midpoints method outperforms the method in [19] in terms of accu-
V ≡ {v i = (ti + ti+1 )/2, i = 1, . . . , N − 1} where {ti }i=1
N
are racy (see [19, Fig. 6]), although training was performed using
training points. The values that minimize the mean squared fewer points (one fourth). In addition, we also considered
error (MSE) on this validation set are then selected. points outside the training interval, and Fig. 1(d) and (e) shows
Remark 5.1: In some cases, an extremely large value for γ , that the extrapolation error remains low for the points near the
normally greater than 107 , can make the matrix in (9) close domain of equation. As expected, by increasing the number of
to singular. mesh points (training points), the error decreases both inside
and outside of the training interval. Fig. 1(c) and (f) indicates
VI. N UMERICAL R ESULTS the performance of the method when nonuniform partitioning
is used for creating training points.
In this section, we have tested the performance of the Problem 2: First-order differential equation with nonlinear
proposed method on seven problems, four first-order and sinusoidal excitation [19, eq. 3]
three second-order ODEs. For the first three problems and
d t
Problem 5, a comparison is made between the solutions y(t) + 2y(t) = t sin
3
y(0) = 1, t ∈ [0, 10].
obtained in [19] and our computed solutions. The numerical dt 2
results of the Problems 4 and 6 are compared with those given The interval [0, 10] is discretized into N = 20 points t1 =
in [18]. Problem 7, which has no analytic solution and is 0, . . . , t20 = 10 using the grid ti = (i − 1)h, i = 1, . . . , N,
a singular problem, is solved and the computed solution is where h = (10/N − 1). In Fig. 2(a), we compare the exact
compared with that reported in [30]. In order to show the solution with the computed solution at grid points (circles)
approximation and generalization capabilities of the proposed as well as for other points inside and outside the domain of
method, we compare the exact solution with the computed equation. The obtained absolute errors for points inside and
solution inside and outside of the domain of consideration. outside the domain [0, 10] are tabulated in Table I. It is clear
Furthermore, the proposed method is successfully applied to that the solution is of higher accuracy compared to the solution
solve Problem 1 for a very large time interval. For all experi- obtained in [19], despite the fact that fewer training points are
ments, the RBF kernel is used, K (u, v) = exp(−(u − v)2 /σ 2 ), used. (Note that in [19], 100 equidistant points are used for
1364 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2012
TABLE II
N UMERICAL R ESULTS OF THE P ROPOSED M ETHOD FOR S OLVING P ROBLEMS 5–7
Fig. 1. Numerical results for Problem 1. (a) Ten equidistant points in [0, 10] are used for training. (b) 25 equidistant points in [0, 10] are used for training.
(c) Nonuniform partitions of [0, 10] using ten points, which are used for training. (d) Obtained absolute errors on the interval [0, 12] when [0, 10] is discretized
into nine equal parts. (e) Obtained absolute errors on the interval [0, 12] when [0, 10] is discretized into 24 equal parts. (f) Obtained absolute errors on the
interval [0, 12] when [0, 10] is discretized into nine nonuniform parts.
Fig. 2. (a) Numerical results for Problem 2. Twenty equidistant points in [0, 10] are used for training. (b) Numerical results for Problem 3. Twenty equidistant
points in [0, 0.5] are used for training. (c) Numerical results for Problem 4. Ten equidistant points in [0, 1] are used for training.
training and the maximum absolute error shown in [19, Fig. Twenty equidistant points in the given interval are used for
13] is approximately 25 × 10−2 .) the training process. The obtained approximate solution by
Problem 3: Consider the following nonlinear first-order the proposed method and the solution obtained by M ATLAB
ODE, which has no analytic solution [19, eq. 6]: built-in solver ODE45 are displayed in Fig. 2(b). The obtained
absolute errors for points inside and outside the domain [0, 0.5]
d
y(t) = y(t)2 + t 2 , y(0) = 1, t ∈ [0, 0.5]. are tabulated in Table I. The proposed method shows a better
dt
MEHRKANOON et al.: APPROXIMATE SOLUTIONS TO ODEs USING LS-SVMs 1365
Fig. 3. (a) Numerical results for Problem 5. Ten equidistant points in [0, 1] are used for training. (b) Numerical results for Problem 6. Ten equidistant points
in [0, 2] are used for training. (c) Numerical results for Problem 7. Ten equidistant points in [0, 1] are used for training.
TABLE III
performance in comparison with the described method in [19]
N UMERICAL R ESULT OF THE P ROPOSED M ETHOD FOR S OLVING
in terms of accuracy, despite the fact that much less number
P ROBLEM 1 W ITH T IME I NTERVAL [0, 105 ]. N I S THE N UMBER OF L OCAL
of training points are used. (Note that in [19], the problem is
C OLLOCATION P OINTS , AND S I S THE N UMBER S UB -D OMAINS
solved over domain [0, 0.2] by using 100 equidistant training
points and the maximum absolute error shown in [19, Fig. 22] MSE
is approximately 4 × 10−2 .) N S CPU time
Training Test
Problem 4: Consider the following first-order ODE with 5.5 2.4 × 10−2 7.2 × 10−2
20 1000
time varying coefficient [18, eq. 1]: 10.6 1.3 × 10−3 3.3 × 10−3
2000
8.4 × 10−8 2.3 × 10−7
d 1 + 3t 2 1 + 3t 2 5000 29.5
y(t) + t + y(t) = t 3 + 2t + t 2 6.6 2.2 × 10−2 5.9 × 10−2
dt 1+t +t 3 1 + t + t3 30 1000
2000 13.4 4.1 × 10−6 1.3 × 10−5
y(0) = 1, t ∈ [0, 1].
5000 37.1 8.2 × 10−9 2.7 × 10−8
In order to have a fair comparison with the results reported 40 1000 9.6 5.8 × 10−4 1.4 × 10−3
in [18], ten equidistant points in the given interval are used 2000 20.1 1.7 × 10−7 5.8 × 10−7
for the training process. The analytic solution and obtained 5000 54.2 2.3 × 10−9 8.1 × 10−9
solution via our proposed method are displayed in Fig. 2(c). Note: The execution time is in seconds.
The obtained absolute errors for points inside and outside
the domain [0, 1] are recorded in Table I, which shows the
superiority of the proposed method over the described method Ten equidistant points in the interval [0, 2] are used for
in [18]. (Note that in [18, Fig. 2], the maximum absolute error the training process. The analytic solution and the obtained
outside the domain [0, 1] is approximately 12 × 10−2 .) solution by the proposed method are shown in Fig. 3(b).
The obtained absolute errors for points inside and outside the
domain [0, 2] are tabulated in Table II, which again shows
B. Second-Order ODEs
the improvement of the proposed method over the described
Problem 5: Consider the following second-order BVP with method in [18]. (Note that in [18, Fig. 4], the maximum
time-varying input signal [19, eq. 4]: absolute error outside the domain [0, 2] is 8 × 10−4 .)
Problem 7: Consider the following singular second-order
d2
y(t) + y(t) = 2 + 2 sin(4t) cos(3t) ODE, which has no analytical closed-form solution
dt 2 [30, eq. 1]:
y(0) = 1, y(1) = 0.
d2 1 d 1
Ten equidistant points in the given interval are used for y(t) + y(t) − cos(t) = 0, y(0) = 0, y (0) = 1.
dt 2 t dt t
the training process. The analytic solution and the obtained t sin(x)
solution via our proposed method are displayed in Fig. 3(a). Exact solution y(t) = d x.
0 x
The obtained absolute errors for points inside and outside
the domain [0, 1] are recorded in Table II, which shows Ten equidistant points in the interval [0, 1] are used as training
the superiority of the proposed method over the described points, and the obtained results are shown in Fig. 3(c) and
method in [19]. (Note that in [19], 100 equidistant points are recorded in Table II. The obtained maximum absolute error
used for training and the maximum absolute error shown in outside the domain [0, 1] is 6.51 × 10−2 , which is smaller
[19, Fig. 17] is approximately 5 × 10−1 .) than 14 × 10−1 shown in [30, Fig. 6].
Problem 6: Consider the following second-order ODE with
time-varying input signal [18, Problem 3]:
C. Sensitivity of the Solution w.r.t the Parameter
d2 1 d 1 −t In order to illustrate the sensitivity of the result with respect
y(t) + y(t) + y(t) = − e( 5 ) cos(t)
dt 2 5 dt 5 to the parameter of the model (σ ), for two examples, we
y(0) = 1, y (0) = 1. have plotted the MSE, on the validation set, versus the kernel
1366 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2012
TABLE IV
N UMERICAL R ESULTS OF THE P ROPOSED M ETHOD FOR S OLVING
P ROBLEM 1 W ITH T IME I NTERVAL [0, 4000], W HILE T OTAL N UMBER OF
C OLLOCATION P OINTS i.e., N × S IS C ONSTANT
MSE
N CPU time
S
Training Test
800 10 85.5 1.36 × 10−8 2.06 × 10−8
400 20 26.1 1.37 × 10−8 2.08 × 10−8
Fig. 4. Sensitivity of the obtained result with respect to model parameter σ . 20 400 2.06 1.68 × 10−8 2.52 × 10−8
log10 (MSE) versus log10 σ is plotted for Problems 2 and 6. Note: The execution time is in seconds.
[11] A. J. Meade and A. A. Fernadez, “The numerical solution of linear Siamak Mehrkanoon received the B.S. degree in
ordinary differential equations by feedforward neural networks,” Math. pure mathematics in 2005 and the M.S. degree in
Comput. Model., vol. 19, no. 12, pp. 1–25, 1994. applied mathematics from the Iran University of
[12] H. Lee and I. Kang, “Neural algorithms for solving differential equa- Science and Technology, Tehran, Iran, in 2007. He
tions,” J. Comput. Phys., vol. 91, no. 1, pp. 110–117, 1990. is currently pursuing the Ph.D. degree with the
[13] B. P. van Milligen, V. Tribaldos, and J. A. Jiménez, “Neural network Department of Electrical Engineering, Katholieke
differential equation and plasma equilibrium solver,” Phys. Rev. Lett., Universiteit Leuven, Leuven, Belgium.
vol. 75, no. 20, pp. 3594–3597, 1995. His current research interests include machine
learning, system identification, pattern recognition,
[14] L. P. Aarts and P. Van der Veer, Solving Nonlinear Differential Equations
and numerical algorithms.
by a Neural Network Method (Lecture Notes in Computer Science),
vol. 2074. New York: Springer-Verlag, 2001, pp. 181–189.
[15] P. Ramuhalli, L. Udpa, and S. S. Udpa, “Finite-element neural networks
for solving differential equations,” IEEE Trans. Neural Netw., vol. 16,
no. 6, pp. 1381–1392, Nov. 2005. Tillmann Falck received the Dipl.Ing. degree
[16] K. S. McFall and J. R. Mahan, “Artificial neural network method for in electrical engineering from Ruhr University
solution of boundary value problems with exact satisfaction of arbitrary Bochum, Bochum, Germany, in 2007. He is pursu-
boundary conditions,” IEEE Trans. Neural Netw., vol. 20, no. 8, pp. ing the Ph.D. degree in electrical engineering from
1221–1233, Aug. 2009. Katholieke Universiteit Leuven, Leuven, Belgium.
[17] I. G. Tsoulos, D. Gavrilis, and E. Glavas, “Solving differential equations He has been with Robert Bosch GmbH, Wuert-
with constructed neural networks,” Neurocomputing, vol. 72, nos. 10–12, temberg, Germany, since 2012. His research interests
pp. 2385–2391, 2009. include nonlinear system identification, convex opti-
[18] I. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial neural networks for mization, machine learning, and tracking and driver
solving ordinary and partial differential equations,” IEEE Trans. Neural assistance systems.
Netw., vol. 9, no. 5, pp. 987–1000, Sep. 1998.
[19] H. S. Yazdi, M. Pakdaman, and H. Modaghegh, “Unsupervised ker-
nel least mean square algorithm for solving ordinary differential
equations,” Neurocomputing, vol. 74, nos. 12–13, pp. 2062–2071,
2011. Johan A. K. Suykens (SM’05) was born in Wille-
[20] B. Schölkopf and A. Smola, Learning with Kernels. Cambridge, MA: broek, Belgium, on May 18, 1966. He received
MIT Press, 2002. the M.S. degree in electromechanical engineering
and the Ph.D. degree in applied sciences from the
[21] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998. Katholieke Universiteit Leuven, Leuven, Belgium, in
[22] J. A. K. Suykens and J. Vandewalle, “Least squares support vector 1989 and 1995, respectively.
machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, He was a Visiting Post-Doctoral Researcher with
1999. the University of California, Berkeley, in 1996. He
[23] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. has been a Post-Doctoral Researcher with the Fund
Vandewalle, Least Squares Support Vector Machines. Singapore: World for Scientific Research FWO Flanders and is cur-
Scientific, 2002. rently a Professor in Hoogleraar with K.U. Leuven.
[24] J. A. K. Suykens, J. Vandewalle, and B. De Moor, “Optimal control by He has authored the books Artificial Neural Networks for Modelling and
least squares support vector machines,” Neural Netw., vol. 14, no. 1, pp. Control of Non-linear Systems (Kluwer Academic Publishers) and Least
23–35, 2001. Squares Support Vector Machines (World Scientific), co-authored the book
[25] K. De Brabanter, J. De Brabanter, J. A. K. Suykens, and B. De Cellular Neural Networks, Multi-Scroll Chaos and Synchronization (World
Moor, “Approximate confidence and prediction intervals for least squares Scientific), and edited the books Nonlinear Modeling: Advanced Black-Box
support vector regression,” IEEE Trans. Neural Netw., vol. 22, no. 1, pp. Techniques (Kluwer Academic Publishers) and Advances in Learning Theory:
110–120, Jan. 2011. Methods, Models and Applications (IOS Press). In 1998, he organized an
[26] H. Ning, X. Jing, and L. Cheng, “Online identification of nonlinear International Workshop on Nonlinear Modeling with Time-series Prediction
spatiotemporal systems using kernel learning approach,” IEEE Trans. Competition.
Neural Netw., vol. 22, no. 9, pp. 1381–1394, Sep. 2011. Dr. Suykens has served as an Associate Editor for the IEEE T RANSAC -
TIONS ON C IRCUITS AND S YSTEMS from 1997 to 1999 and from 2004 to
[27] M. Lázaro, I. Santamaria, F. Pérez-Cruz, and A. Artés-Rodriguez, “Sup-
2007, and for the IEEE T RANSACTIONS ON N EURAL N ETWORKS from 1998
port vector regression for the simultaneous learning of a multivariate
to 2009. He was a recipient of the IEEE Signal Processing Society Best
function and its derivative,” Neurocomputing, vol. 69, nos. 1–3, pp.
Paper (Senior) Award in 1999 and several Best Paper Awards at International
42–61, 2005.
Conferences. He was a recipient of the International Neural Networks Society
[28] D. R. Kincaid and E. W. Cheney, Numerical Analysis: Mathematics INNS 2000 Young Investigator Award for significant contributions in the field
of Scientific Computing, 3rd ed. Pacific Grove, CA: Brooks/Cole, of neural networks. He has served as a Director and Organizer of the NATO
2002. Advanced Study Institute on Learning Theory and Practice (Leuven, 2002),
[29] A. Toselli and O.B. Widlund, Domain Decomposition as a Program Co-Chair for the International Joint Conference on Neural
Methods-Algorithms and Theory. Berlin, Germany: Springer-Verlag, Networks in 2004 and the International Symposium on Nonlinear Theory
2005. and its Applications 2005, as an organizer of the International Symposium
[30] I. G. Tsoulos and I. E. Lagaris, “Solving differential equations with on Synchronization in Complex Networks in 2007 and a co-organizer of the
genetic programming,” Genetic Program. Evolvable Mach., vol. 7, no. 1, NIPS 2010 workshop on Tensors, Kernels and Machine Learning. He was
pp. 33–54, 2006. awarded an ERC Advanced Grant in 2011.