Professional Documents
Culture Documents
Zhigangyu 2006
Zhigangyu 2006
Zhigangyu 2006
*
This work is supported by foundation HIT2002.12.
Output of NN
minimization. We present a method for design of RBF neural
networks to solve overfitting problem. For a practical
application, frequency information is usually available for the
design of RBF networks by frequency domain analysis, which Neural network model with
has a sound mathematical basis. We try to include the overfitting
frequency information into the design of RBF neural
networks, which achieve the task of approximating a function Input of NN
in certain frequency range. This is the first motivation of this Fig.1 The neural network with a bad generalization property
study by Fourier series. For a problem to be solved, the physical phenomenon
II. THE DESIGN OF STERUCTURAL RISK responsible for generating the training data (e.g., speech,
MINMIZATION OF RBF NETWORK FOR AVOIDING pictures, radar signals, sonar signals, seismic data) is a well-
OVERFITTING posed direct problem. However, learning from such physical
form of data, viewed as multidimensional mapping
A. The Overfitting problem of RBF networks reconstruction problem, is ill-posed inverse problem for the
To develop a deep description of the overfitting problem, some reasons [7].
firstly, the interpolation problem, in a strict sense, may be There is no way to overcome this difficulty unless some
stated as follows: prior information about the input-output mapping is available.
Given a set of N different points {xi ∈ R m | i = 1, , N } In this context, it is rather appropriate that we remind
ourselves of a statement made by Lancoz [8]: “A lack of
and a corresponding set of N real number
information cannot be remedied by any mathematical
{di ∈ R | i = 1, , N } , find a function: R m → R that satisfies trickery.” The important issue of how to utilize the useful
the interpolation condition: f ( X i ) = d i , i = 1, , N . The information available to design RBF networks is discussed in
interpolation function is constrained to pass through all the this section. The idea of avoiding overfitting is that we design
training data point. The RBF technique consists of choosing a a RBF networks to approximate certain frequency ingredient,
but traditional methods cannot ascertain the property of the
function f ( x ) that has the following model: frequency of RBF networks.
h
f ( x, θ ) = ∑ wiφi ( x ) B. The design for RBF network of structural risk
i =1 minimization
where consider m-h-l RBF networks, m is the number of input
neurons, h is the number of hidden neurons. wi is the weight
{ 2
}
In L2 ( R m ) : f ( x ) | ∫ f ( x ) dt < +∞ , let f ( x ) : S → R ,
2753
ω0 , ωn is the bases wave for the corresponding input N1 Nn
f (X ) = ∑ ∑ ( wpc cos ( PX ) + wps sin ( PX ))
variable. Nonlinear functions f ( x ) usually are not period p1 = 0 pn = 0
= σ 2 σ n ( x1 − c1 ) + σ 1σ 3 σ n ( x0 − c2 )
2 2
-0.5
+ , , +σ 0σ 1 σ n −1 ( x0 − cn )
2
-1
-5 -4 -3 -2 -1 0 1 2 3 4 5
where Ci is a n × n norm weighting matrix, and n is the
dimension of the input vector X ,
CpT Cp = diag (c11p c22p cnnp ) ,
Fig. 2 The fitting result of a cosine function with period
by three Gaussian radial-basis functions
c jjp = σ1σ 2 σ j −1σ j +1 σ n , j = 1, 2, , n .
100
The sum of squares error is ε = ∑ d p − y p ( )
2
= 0.027 .
p =1 Specifically, we can get
It is obvious that a single-period cosine function within (1) The number of hidden units is
[−3,3] can be fitted quite well by three Gaussian radial basis n Ni
N = ∏ N i ∑ ⎢⎣2 ( bi − ai ) j Ti ⎥⎦ , where N i = ⎢⎣ Bi T j 2π ⎥⎦ .
i =1 j=0
functions. For a cosine function with period T , the locations
of the centers of RBF are −3T 4 , 0 , 3T 4 respectively and (2) The centers are located at the point
pω
their width satisfy the following empirical formula ck ,jrmik = π rmi p j ωk + π kl 4 p j ωk ,
σ p2 ω = 0.35 (T 2 ) , T ∈ [1.04,3.14] . Moreover after the where for cosine function cos ( p j ωk xi + kπ 2 ) , k = 0,1 . And
1.75
i j
initially selecting the centers and widths of RBF the the corresponding width is σ p2 jωk = 0.35 (T 2 )
1.75
, where
approximation accuracy will be further improved with
gradient descent algorithm. Using the initial values of τ jk = 2π p jωk .
weighting parameters in the vicinity of these values can
E. Learning algorithm for RBF neural network
accelerate their convergence.
In general, a good learning algorithm must have fast
D. The structure of Gaussian radial basis function neural learning as well as good computational capacity and
network based on N-dimensional Fourier series generalization capacity. The number of adjusted weight is
We have an insight into the property of Gaussian n
2754
optimum parameters of a linear mapping is a linear Bx = 2.51 , By = 2.51 , Tx = 2 π B x = 2.5 , Ty = 2 π B y = 1.67 .
optimization problem.
For the sine function sin ( 0.8π x ) , the number of centers is
Here, the weights wkp between the hidden layer and
output layer are initialized randomly. The training goal is to rx = 2 by rx ≥ 2 (b − a ) Tx . For cosine function cos (1.2π y ) ,
minimize the following objective function defined by the number of centers is ry = 3 by ry ≥ 2 (b − a ) Ty . The total
l l 2
1 1
E= ∑ ei 2 = l ∑
l i =1
( yi − f ( xi ,θ )) number of the center is r = rx ry = 6 . The center vectors are
c1 = ( c0,1 ),
i =1 0.8π 1.2π
, c1,1
where yi is the desired output. Adaptation formulas for the
linear weights of RBF network is c2 = (c0,1
0.8π 1.2 π
, c1,2 ),
M 2 (bi − ai ) Tij
ei ( n )φ ( x j )
... ,
∂E ( n ) ∂wkp ( n ) = ∑ ∑
i =1 j =0 c6 = ( c0,2
0.8π 1.2π
, c1,3 ).
,
where ck ,jrmik = π rmi p jωk + π kl 4 p jωk , piω j = {0.8π ,1.2π } ,
pω
wkp ( n + 1) = wkp ( n ) − η ∂E ( n ) ∂wkp ( n )
where 0 ≤ pi ≤ N i , i = {1, 2, , n} . k = {0,1} . According to formula σ 2piω j = 0.35 (τ ij 2 )
1.75
,
Therefore, we can obtain the correct mapping
relationship of the Fourier neural networks after getting where τ ij ={2.5,1.67} , these widths are σ 0.8π = 0.52 ,
correct weights between the hidden layer and the output layer σ 1.2π = 0.26 . The initial structure of network is 2-6-1
units. The linear weights of the output layer are the only set of network. The approximation accuracy of RBF networks is
adjustable parameters. In doing so, the likelihood of evaluated by the formula:
converging to an undesirable local minimum in center space 128
∑ e(x)
2
and the width space is reduced. These centers and the width of ε= ,
RBF were kept fixed during the learning process to avoid x =1
2755
Let Ti = 2 , extend f ( x1 , x2 , , x5 ) curves for variable IV. CONCLUSIONS
x1 , x2 , , x5 on ( −∞, ∞ ) periodically. So f ( x1 , x2 , , x5 ) is The RBF neural network restructured by means of
turn into a periodical function. structural risk minimization result in better generalization
The simulation result for SRM-RBF neural network is properties to minimize the risk of overfitting. If designing of
shown in Fig. 3(a). For comparison, the experimental RBF networks lacks the necessary information of frequency,
examples above are repeated by using the multi-layer the reconstructed input-output mapping to avoid overfitting
perceptron (MLP) as identification model, with delta-bar-delta has nothing to do with the true solution by learning algorithm.
with adaptive gain of neurons and the standard back- There is no way to overcome this difficulty unless some prior
propagation algorithm [9]. The identified output and the frequency information about the input-output mapping is
actual output of the plant are given in Fig.3 (b) and (c) for the available. The approach using the frequency information is
multi-layer perceptron mentioned above. As shown in Fig.3, usually available for design RBF networks is contrast with
the SRM-RBF neural network model outperforms MLP model traditional approach. Simulation results demonstrate that the
in term of accuracy measured in MSE. SRM-RBF network presented is powerful properties of
avoiding overfitting properties at least as good as Foruier
0.5 approximation.
0 REFERENCES
output
a
-0.5 [1] Robert J. Schilling, James J. Carroll, Approximation of nonlinear systems
with radial basis function neural networks, IEEE Transactions on Neural
-1
0 100 200 300 400 500 600 700 800 Networks, vol.12, no.1, pp.1-15, January 2001.
1 [2] Irwin W. Sandberg. Gaussian radial basis functions and the
approximation of input-output maps. Proceedings of the 42nd IEEE
Conference on Decision and Control. 2003, 3635-3639.
output
2756