Professional Documents
Culture Documents
Khosravi2012 Article ANovelStructureForRadialBasisF
Khosravi2012 Article ANovelStructureForRadialBasisF
DOI 10.1007/s11063-011-9210-0
Hossein Khosravi
Abstract A novel structure for radial basis function networks is proposed. In this structure,
unlike traditional RBF, we set some weights between input and hidden layer. These weights,
which take values around unity, are multiplication factors for input vector and perform a lin-
ear mapping. Doing this, we increase free parameters of the network, but since these weights
are trainable, the overall performance of the network is improved significantly. According to
the new weight vector, we called this structure Weighted RBF or WRBF. Weight adjustment
formula is provided by applying the gradient descent algorithm. Two classification problems
used to evaluate performance of the new RBF network: letter classification using UCI data-
set with 16 features, a difficult problem, and digit recognition using HODA dataset with 64
features, an easy problem. WRBF is compared with classic RBF and MLP network, and our
experiments show that WRBF outperforms both significantly. For example, in the case of 200
hidden neurons, WRBF achieved recognition rate of 92.78% on UCI dataset while RBF and
MLP achieved 83.13 and 89.25% respectively. On HODA dataset, WRBF reached 97.94%
recognition rate whereas RBF achieved 97.14%, and MLP accomplished 97.63%.
Keywords Radial basis · Neural network · RBF · WRBF · Classification · Gradient descent
1 Introduction
Classification is a key element in the field of machine learning. The purpose of classifi-
cation is to discriminate between two or more classes of objects having different features.
Humans classify objects easily; we see, experience and learn to recognize objects. However,
in machine learning, some algorithms must be developed to recognize objects. Artificial
neural network, ANN, is a popular machine learning technique towards classification. ANN
inspired from biological neural networks, but current models are so simple. Researches are
in progress to make ANNs become more like the biological NNs [1–4].
H. Khosravi (B)
Department of Electrical and Robotic Engineering, Shahrood University of Technology, Shahrood, Iran
e-mail: Hosseinkhosravi@gmail.com
123
178 H. Khosravi
0.8
0.6
0.4
0.2
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Artificial neural networks have been defined by Kohonen as “massively parallel intercon-
nected networks of simple (usually adaptive) elements and their hierarchical organizations,
which are intended to interact with the objects of the real world in the same way as biological
nervous system do” [5]. Neural networks attempt to achieve good performance via dense
mesh of computing nodes and connections [6].
Application areas of neural networks include pattern classification [7,8], function approx-
imation [9], system identification, vehicle control, quantum chemistry [10], decision making,
sequence recognition, medical diagnosis, financial applications, data mining, visualization
and e-mail spam filtering.
Radial basis function network is a special case of neural networks, which is originally used
for regression problems, i.e. function approximation. In recent decades, RBF is well suited
for classification problems and provided valuable results. First time in 1964 [11] used RBF
networks, for classification problems and since then RBF opened its way toward classification
problems, and today it is known as a reliable classifier [12–15].
RBF network consists of three layers: input layer, radial basis functions and output layer.
Input layer takes features extracted from training samples and directly delivers them to the
RBF layer. Hidden neurons are radial basis functions. These functions should have four
specifications [16]:
A typical function commonly used in RBF layer is Gaussian function (Fig. 1):
X − μm 2
ym = f m (X ) = exp − (1)
2σm2
Here m is index of the hidden neuron, X is the input vector, μm is mean vector or prototype
vector of mth neuron and σm is the spread parameter.
In regression problems, number of hidden neurons can be equal to the number of training
samples so that mean vector, μm , for each neuron would be the same as one of the training
samples. In fact, in this way each Gaussian function will cover part of input space and if
training samples have suitable distribution, the network will perform well.
123
A Novel Structure for Radial Basis Function Networks—WRBF 179
2.5
1.5
0.5
0
-4 -3 -2 -1 0 1 2 3 4
In classification problems, where we deal with large datasets, generally the number of
hidden neurons is much less than the number of training samples but still more than a MLP
classifier [17,18]. We will discuss this, later in Sect. 4.
Output layer in RBF is same as the output layer of MLP and neurons in this layer can take
several activation functions like the sigmoid, linear or hyperbolic tangent [19]. Connections
between the hidden and output layers have some weights, which are trainable parameters of
the RBF network. In other words, weighted sum of radial basis functions, applied on input
vector, is fed to the output layer. Figure 2 shows a function constructed from a weighted sum
of three Gaussian functions.
The common method for training RBF network is the back propagation algorithm which
was originally proposed by [20,21]. Since then several modifications of RBF networks were
proposed. Some of them are on training algorithms [22–24] and the others on the structure
of RBF networks like number of hidden neurons and type of radial basis functions [25–28].
Furthermore, some new networks were generated using the idea of RBF network; e.g. WNN1 ,
is a network like RBF which its activation functions are wavelets rather than Gaussian (see
[29,30] for details).
In this paper, we propose a structural modification for RBF network that makes significant
improvements in the results of the network. This modification increases the free parameters
of the network but improves recognition rate drastically.
The rest of the paper is organized as follows: In the next section, we introduce differ-
ent ways toward training a RBF network. The main idea of WRBF network is presented in
Sect. 3. Evaluation of the proposed network and experimental results are expressed in Sect. 4
and finally conclusion is presented in Sect. 5.
There are three strategies of RBF training: no training, half training and full training. The
first one is usually used in regression problems. In this way, there is no iterative process of
training and network weights are calculated mathematically using matrix inverse or pseudo
inverse [19]. For example, to interpolate a function with M points using a RBF network with
M hidden neurons, we must set centers, μm , of hidden neurons equal to training samples,
and then we have:
1 Wavelet neural network
123
180 H. Khosravi
M
Z (X ) = wm ym (2)
1
This equation is the relationship between the hidden and output neurons. In this case, Z
is the only output neuron of the network. wm is the weight between m’th hidden neuron and
the output neuron, and ym is output of m’th hidden neuron (Eq. 1). Applying this equation
on all samples and writing in matrix form we have:
YW = D (3)
Y is an M × M matrix, W is an M × 1 vector of weights and D is target vector for M training
points.
Now W can be found using inverse matrix:
W = Y −1 D (4)
If the number of hidden neurons is less than M, network weights will be found using the
pseudo inverse method:
W = (Y T Y )−1 Y T D (5)
This strategy is simple and effective when the number of training samples is small. How-
ever, when dealing with large datasets, inverse calculation of the matrix becomes a challenge.
Furthermore, experiences show that generalization of this method is not as well as other
methods.
In half-training the network weights are found through an iterative training process. The
common training method is back-propagation. In this way which is similar to MLP network;
weights of the output layer are adjusted based on a cost function like sum of square error.
In no-training and half-training methods, mean vectors of hidden neurons, μm , and spread
parameters, σm , are determined in advance and will not be changed during training process.
The methods of finding these parameters are described in [19].
In full-training, all parameters of the network including weights of output layer, mean
vectors of hidden neurons, μm , and spread parameters, σm , will be determined through the
training process. In this way after initializing parameters, a gradient descent algorithm is
applied to find errors based on each parameter and then update rule is applied to adjust the
parameter. For example, output weights will be updated using the following equation:
∂E
wim = wim − η (6)
∂wim
Here E is the cost function, e.g. sum of squared errors and η is learning rate.
3 Weighted RBF
In this paper, we focus on RBF network from classification point of view. Furthermore,
whenever we talk about training, we mean the full training paradigm.
In traditional RBF, adjustable weights are σ s and μs of hidden neurons and weights
between the hidden and output layers. Comparing RBF with MLP, we found that in MLP,
the connections between input and hidden layer have also trainable weights while in RBF,
there are no such weights and input neurons directly go to the RBF neurons without any
modification. In other words, these connections have weights equal to unity (Fig. 3).
123
A Novel Structure for Radial Basis Function Networks—WRBF 181
y1
x1 z1
x2 z2
• •
•
• •
•
• •
•
ym
xn zl
The main idea was formed; we thought that if these weights can oscillate slightly around
1.0, they can do some feature mapping and help the network performs better. So we intro-
duced another weight vector, W, that is applied on input vector and performs a linear mapping.
Now the activation function (Eq. 1) is changed as follows:
Wm X − μm 2
ym = f m (X ) = exp − (7)
2σm2
Here Wm = {w1m , w2m , . . . , wnm } is a weight vector between m’th hidden neuron and input
vector. Dimension of this vector is equal to the input vector dimension.
By adding this vector, we allow input weights the opportunity of oscillation around unity
and so contributing in feature mapping. As we will see in Sect. 3.1, these weights take values
between 0.6 and 1.4 typically.
Now the main thing that we must provide is the adjustment of these weights. We used
gradient descent algorithm to find the weight adjustment formula (Appendix A).
3.1 Initialization
Given that in traditional RBF there are no weights between input and hidden neurons, it is
reasonable to set the initial values of the new weight vector equal to 1.0 or some random
values around 1. Our experiments showed that the best choice for initial value is 1.0.
It is interesting to know that the mean of final trained weights is somewhat close to 1, e.g.
1.02 or 0.98. Figure 4 shows mean values of weights connected to each neuron in the case
of 128 hidden neurons.
4 Experimental Results
To compare WRBF network with basic RBF, we implemented both networks in VC++ and
used them for two classification problems: letter recognition and digit recognition. We used
UCI dataset of English letters2 for letter recognition and HODA digit dataset [31] for digit
recognition. UCI dataset contains 20000 samples, 16000 for training and 4000 for test. HODA
dataset contains 80000 samples of Farsi handwritten digits; 60000 for training and 20000 for
2 http://archive.ics.uci.edu/ml/datasets/Letter+Recognition.
123
182 H. Khosravi
1.8
Average of weight vectors between
Mean = 1.0217
1.6 Std = 0.1561
input and hidden layer (Wi)
1.4
1.2
0.8
0.6
0.4
0 20 40 60 80 100 120 140
Hidden Neuron
Fig. 4 Average values of new weight vectors between input and RBF layer of 128 hidden neurons
0.24
0.22 Weighted RBF
Basic RBF
0.2
0.18
0.16
RMS Error
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 10 20 30 40 50 60 70 80
Epoch
Fig. 5 RMS Error for Basic RBF network (top) and WRBF network (bottom)
test. UCI dataset includes 16 features per each sample while, for HODA dataset, we extracted
64 features per each sample using average value of pixels in 64 windows.
To have a comparison with MLP, we also implemented classic MLP with momentum and
include it as well. The results to be fair, we ran each classifier several times and the following
results are the average recognition rates achieved for each network structure. RMS error is
used as stopping criteria and whenever this error did not change significantly in successive
epochs, training is stopped. Figure 5 shows the error plot for digit recognition problem in
the training process. The first advantage of WRBF over RBF is observed in this figure; RMS
error of WRBF classifier is drastically less than RBF classifier.
We tested each classifier with several numbers of hidden neurons to find the best choice
as well as comparing the ability of networks at different structures. Table 1 shows the results
of three classifiers for UCI dataset. In this problem, we see that WRBF outperforms both
123
A Novel Structure for Radial Basis Function Networks—WRBF 183
classic RBF and MLP networks in all structures with a significant difference. Comparing
with traditional RBF we have about 10% improvement.
In the case of HODA dataset, which can be seen as a simple classification problem, as
illustrated in Table 2, again WRBF performs better than RBF. Comparing with MLP, when
the number of hidden neurons is small, MLP generates superior results but the best result of
WRBF, 97.94%, is still better than MLP as well.
Here, it may raise a question considering Tables 1 and 2: Why the role of MLP and WRBF
interchanged in the first rows of the Table 2? In fact, there is no theoretical explanation for
this, since the structure of neural nets is developed to get rid of manual interpretation and
solutions! Even so, some general expressions may help us finding the relation. As we dis-
cussed before, for a small number of neurons, RBF may not perform as well as MLP and this
is evident in all experiments. It is due to the nature of RBF since it makes a couple of Gaussian
functions around data (Fig. 6b) and if the distribution of the samples is not semi-Gaussian
(like Fig. 6c) lots of neurons will be required to cover the input space (Fig. 6e, f) while in
MLP, some lines or hyperspaces could separate classes (Fig. 6d).
The previous statement makes sense about traditional RBF and MLP, but WRBF is a
modified structure of RBF in which inputs were mapped before going through Gaussian
functions; so it cannot be interpreted easily as RBF or MLP. By now, we may think about
WRBF as a nonlinear version (since it assigns different weights to the inputs) of RBF, which
makes it perform better, but exact description about its behavior may require further efforts.
In general, it is clear from our experiments that WRBF always outperforms traditional
RBF and can be used as an improved version of RBF. Furthermore, if the number of hidden
neurons is reasonable, WRBF will generate better results than MLP.
5 Conclusion
In this paper, a new structure for RBF network, named WRBF, was proposed. In this network,
the connections between input and hidden neurons were allowed to have some weights around
123
184 H. Khosravi
Fig. 6 a Simple classification problem solved by MLP. b Similar problem solved with RBF. c Samples with
non-Gaussian distribution. d MLP solution for non-Gaussian samples. e RBF solution with few hidden neurons
(not a success) f RBF solution with large number of neurons
unity. Experiments showed that these weights which are typically between 0.6 and 1.4 make
significant improvements in pattern recognition problems. A gradient descent algorithm was
proposed for adjustment of the new weights. Two classification problems used to evaluate
performance of WRBF network: letter classification and digit recognition. In both problems,
WRBF outperformed classic RBF and MLP network. For example in the case of 200 hid-
den neurons, WRBF achieved recognition rate of 92.78% on UCI dataset while RBF and
MLP achieved 83.13 and 89.25% respectively. On HODA dataset, WRBF reached 97.94%
recognition rate while RBF and MLP achieved 97.14 and 97.63% respectively.
We use sum of square error as the target criteria which must be minimized:
E= (t j − z j )2 (8)
j
where t j is jth element of target vector and z j is the jth element of output vector:
1
zj = (9)
1 + e−s j
Here s j is weighted sum of nodes coming to jth output neuron as follows:
M
sj = u m j ym (10)
m=1
Here, u m j is the weight between mth hidden neuron and jth output neuron and ym is the
output of mth hidden neuron as in Eq. 2.
123
A Novel Structure for Radial Basis Function Networks—WRBF 185
To find adjustment value, we should compute gradient of Error (Eq. 8 versus Wim (weight
between ith input neuron and mth hidden neuron):
∂E ∂E ∂z j ∂ ym
= × × (11)
∂wim ∂z j ∂ ym ∂wim
j
References
123
186 H. Khosravi
12. Hojjatoleslami A, Sardo L, Kittler J (1997) An RBF based classifier for the detection of microcalcifica-
tions in mammograms with outlier rejection capability In: International conference on neural networks,
pp 379–1384
13. Wang D et al (2002) Protein sequences classification using radial basis function (RBF) neural networks.
In: 9’th international conference on neural information processing, Singapore, pp 764–769
14. Chen S et al (2008) Symmetric RBF classifier for nonlinear detection in multiple-antenna-aided systems.
IEEE Tran Neural Netw 19(5):737–745
15. Meng K et al (2010) A self-adaptive RBF neural network classifier for transformer fault analysis. IEEE
Trans Power Syst 25(3):1350–1360
16. Vakil-Baghmisheh M-T, Pavesic N (2004) Training RBF networks with selective backpropagation.
Neurocomputing 62:39–64
17. King JL, Reznik L (2006) Topology selection for signal change detection in sensor networks: RBF vs.
MLP. In: International joint conference on neural networks, pp 2529–2535
18. Polat G, Altun H (2007) Evalutation of performance of KNN, MLP and RBF classifiers in emotion
detection problem. In: 15th conference on signal processing and communications applications, vol 1,
pp 1–4
19. Haykin S (2009) Neural networks and learning machines, 3rd edn. Prentice-Hall, Upper Saddle River
20. Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioral science.
PhD thesis, Harvard University
21. Parker DB (1982) Learning logic. Invention Report S81-64, File 1, Oce of Technology Licensing. Stanford
University
22. Ciocoiu IB (2002) RBF networks training using a dual extended Kalman filter. Neurocomputing 48:
609–622
23. Abe Y, Iiguni Y (2006) Fast computation of RBF coefficients using FFT. Sig Process 86(11):3264–3274
24. Ho K, Leung C-s, Sum J (2011) Training RBF network to tolerate single node fault. Neurocomputing
74(6):1046–1052
25. Sanchez AD (2002) Searching for a solution to the automatic RBF network design problem. Neurocom-
puting 42:147–170
26. Zhang R et al (2007) Improved GAP-RBF network for classification problems. Neurocomputing 70(16–
18):3011–3018
27. Tao J, Wang N (2007) Splicing system based genetic algorithms for developing rbf networks models.
Chin J Chem Eng 15(2):240–246
28. Sug H (2010) Generating better radial basis function network for large data set of census. Int J Softw Eng
Appl 4(2):15–22
29. Zhang Q, Benveniste A (1992) Wavelet networks. IEEE Trans Neural Netw 3(6):889–898
30. Balabin RM, Safieva RZ, Lomakina EI (2008) Wavelet neural network (WNN) approach for calibration
model building based on gasoline near infrared (NIR) spectra. Chemometr Intell Lab Syst 93:58–62
31. Khosravi H, Kabir E (2007) Introducing a very large dataset of handwritten Farsi digits and a study on
their varieties. Pattern Recogn Lett 28(10):1133–1141
123