Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Neural Process Lett (2012) 35:177–186

DOI 10.1007/s11063-011-9210-0

A Novel Structure for Radial Basis Function


Networks—WRBF

Hossein Khosravi

Published online: 29 December 2011


© Springer Science+Business Media, LLC. 2011

Abstract A novel structure for radial basis function networks is proposed. In this structure,
unlike traditional RBF, we set some weights between input and hidden layer. These weights,
which take values around unity, are multiplication factors for input vector and perform a lin-
ear mapping. Doing this, we increase free parameters of the network, but since these weights
are trainable, the overall performance of the network is improved significantly. According to
the new weight vector, we called this structure Weighted RBF or WRBF. Weight adjustment
formula is provided by applying the gradient descent algorithm. Two classification problems
used to evaluate performance of the new RBF network: letter classification using UCI data-
set with 16 features, a difficult problem, and digit recognition using HODA dataset with 64
features, an easy problem. WRBF is compared with classic RBF and MLP network, and our
experiments show that WRBF outperforms both significantly. For example, in the case of 200
hidden neurons, WRBF achieved recognition rate of 92.78% on UCI dataset while RBF and
MLP achieved 83.13 and 89.25% respectively. On HODA dataset, WRBF reached 97.94%
recognition rate whereas RBF achieved 97.14%, and MLP accomplished 97.63%.

Keywords Radial basis · Neural network · RBF · WRBF · Classification · Gradient descent

1 Introduction

Classification is a key element in the field of machine learning. The purpose of classifi-
cation is to discriminate between two or more classes of objects having different features.
Humans classify objects easily; we see, experience and learn to recognize objects. However,
in machine learning, some algorithms must be developed to recognize objects. Artificial
neural network, ANN, is a popular machine learning technique towards classification. ANN
inspired from biological neural networks, but current models are so simple. Researches are
in progress to make ANNs become more like the biological NNs [1–4].

H. Khosravi (B)
Department of Electrical and Robotic Engineering, Shahrood University of Technology, Shahrood, Iran
e-mail: Hosseinkhosravi@gmail.com

123
178 H. Khosravi

0.8

0.6

0.4

0.2

0
-5 -4 -3 -2 -1 0 1 2 3 4 5

Fig. 1 One dimensional Gaussian function with μ = 0 and σ = 2

Artificial neural networks have been defined by Kohonen as “massively parallel intercon-
nected networks of simple (usually adaptive) elements and their hierarchical organizations,
which are intended to interact with the objects of the real world in the same way as biological
nervous system do” [5]. Neural networks attempt to achieve good performance via dense
mesh of computing nodes and connections [6].
Application areas of neural networks include pattern classification [7,8], function approx-
imation [9], system identification, vehicle control, quantum chemistry [10], decision making,
sequence recognition, medical diagnosis, financial applications, data mining, visualization
and e-mail spam filtering.
Radial basis function network is a special case of neural networks, which is originally used
for regression problems, i.e. function approximation. In recent decades, RBF is well suited
for classification problems and provided valuable results. First time in 1964 [11] used RBF
networks, for classification problems and since then RBF opened its way toward classification
problems, and today it is known as a reliable classifier [12–15].
RBF network consists of three layers: input layer, radial basis functions and output layer.
Input layer takes features extracted from training samples and directly delivers them to the
RBF layer. Hidden neurons are radial basis functions. These functions should have four
specifications [16]:

1. Attaining the maximum value in the center (zero distance).


2. Having a considerable value in the close neighborhood of the center.
3. Having a negligible value in far distances (where are close to other centers).
4. Differentiability

A typical function commonly used in RBF layer is Gaussian function (Fig. 1):
 
X − μm 2
ym = f m (X ) = exp − (1)
2σm2

Here m is index of the hidden neuron, X is the input vector, μm is mean vector or prototype
vector of mth neuron and σm is the spread parameter.
In regression problems, number of hidden neurons can be equal to the number of training
samples so that mean vector, μm , for each neuron would be the same as one of the training
samples. In fact, in this way each Gaussian function will cover part of input space and if
training samples have suitable distribution, the network will perform well.

123
A Novel Structure for Radial Basis Function Networks—WRBF 179

2.5

1.5

0.5

0
-4 -3 -2 -1 0 1 2 3 4

Fig. 2 Weighted sum of three Gaussian function

In classification problems, where we deal with large datasets, generally the number of
hidden neurons is much less than the number of training samples but still more than a MLP
classifier [17,18]. We will discuss this, later in Sect. 4.
Output layer in RBF is same as the output layer of MLP and neurons in this layer can take
several activation functions like the sigmoid, linear or hyperbolic tangent [19]. Connections
between the hidden and output layers have some weights, which are trainable parameters of
the RBF network. In other words, weighted sum of radial basis functions, applied on input
vector, is fed to the output layer. Figure 2 shows a function constructed from a weighted sum
of three Gaussian functions.
The common method for training RBF network is the back propagation algorithm which
was originally proposed by [20,21]. Since then several modifications of RBF networks were
proposed. Some of them are on training algorithms [22–24] and the others on the structure
of RBF networks like number of hidden neurons and type of radial basis functions [25–28].
Furthermore, some new networks were generated using the idea of RBF network; e.g. WNN1 ,
is a network like RBF which its activation functions are wavelets rather than Gaussian (see
[29,30] for details).
In this paper, we propose a structural modification for RBF network that makes significant
improvements in the results of the network. This modification increases the free parameters
of the network but improves recognition rate drastically.
The rest of the paper is organized as follows: In the next section, we introduce differ-
ent ways toward training a RBF network. The main idea of WRBF network is presented in
Sect. 3. Evaluation of the proposed network and experimental results are expressed in Sect. 4
and finally conclusion is presented in Sect. 5.

2 Training Paradigms for RBF Network

There are three strategies of RBF training: no training, half training and full training. The
first one is usually used in regression problems. In this way, there is no iterative process of
training and network weights are calculated mathematically using matrix inverse or pseudo
inverse [19]. For example, to interpolate a function with M points using a RBF network with
M hidden neurons, we must set centers, μm , of hidden neurons equal to training samples,
and then we have:
1 Wavelet neural network

123
180 H. Khosravi


M
Z (X ) = wm ym (2)
1

This equation is the relationship between the hidden and output neurons. In this case, Z
is the only output neuron of the network. wm is the weight between m’th hidden neuron and
the output neuron, and ym is output of m’th hidden neuron (Eq. 1). Applying this equation
on all samples and writing in matrix form we have:
YW = D (3)
Y is an M × M matrix, W is an M × 1 vector of weights and D is target vector for M training
points.
Now W can be found using inverse matrix:
W = Y −1 D (4)
If the number of hidden neurons is less than M, network weights will be found using the
pseudo inverse method:
W = (Y T Y )−1 Y T D (5)
This strategy is simple and effective when the number of training samples is small. How-
ever, when dealing with large datasets, inverse calculation of the matrix becomes a challenge.
Furthermore, experiences show that generalization of this method is not as well as other
methods.
In half-training the network weights are found through an iterative training process. The
common training method is back-propagation. In this way which is similar to MLP network;
weights of the output layer are adjusted based on a cost function like sum of square error.
In no-training and half-training methods, mean vectors of hidden neurons, μm , and spread
parameters, σm , are determined in advance and will not be changed during training process.
The methods of finding these parameters are described in [19].
In full-training, all parameters of the network including weights of output layer, mean
vectors of hidden neurons, μm , and spread parameters, σm , will be determined through the
training process. In this way after initializing parameters, a gradient descent algorithm is
applied to find errors based on each parameter and then update rule is applied to adjust the
parameter. For example, output weights will be updated using the following equation:
∂E
wim = wim − η (6)
∂wim
Here E is the cost function, e.g. sum of squared errors and η is learning rate.

3 Weighted RBF

In this paper, we focus on RBF network from classification point of view. Furthermore,
whenever we talk about training, we mean the full training paradigm.
In traditional RBF, adjustable weights are σ s and μs of hidden neurons and weights
between the hidden and output layers. Comparing RBF with MLP, we found that in MLP,
the connections between input and hidden layer have also trainable weights while in RBF,
there are no such weights and input neurons directly go to the RBF neurons without any
modification. In other words, these connections have weights equal to unity (Fig. 3).

123
A Novel Structure for Radial Basis Function Networks—WRBF 181

y1
x1 z1

x2 z2
• •

• •

• •

ym
xn zl

Fig. 3 Traditional RBF network structure

The main idea was formed; we thought that if these weights can oscillate slightly around
1.0, they can do some feature mapping and help the network performs better. So we intro-
duced another weight vector, W, that is applied on input vector and performs a linear mapping.
Now the activation function (Eq. 1) is changed as follows:
 
Wm X − μm 2
ym = f m (X ) = exp − (7)
2σm2
Here Wm = {w1m , w2m , . . . , wnm } is a weight vector between m’th hidden neuron and input
vector. Dimension of this vector is equal to the input vector dimension.
By adding this vector, we allow input weights the opportunity of oscillation around unity
and so contributing in feature mapping. As we will see in Sect. 3.1, these weights take values
between 0.6 and 1.4 typically.
Now the main thing that we must provide is the adjustment of these weights. We used
gradient descent algorithm to find the weight adjustment formula (Appendix A).

3.1 Initialization

Given that in traditional RBF there are no weights between input and hidden neurons, it is
reasonable to set the initial values of the new weight vector equal to 1.0 or some random
values around 1. Our experiments showed that the best choice for initial value is 1.0.
It is interesting to know that the mean of final trained weights is somewhat close to 1, e.g.
1.02 or 0.98. Figure 4 shows mean values of weights connected to each neuron in the case
of 128 hidden neurons.

4 Experimental Results

To compare WRBF network with basic RBF, we implemented both networks in VC++ and
used them for two classification problems: letter recognition and digit recognition. We used
UCI dataset of English letters2 for letter recognition and HODA digit dataset [31] for digit
recognition. UCI dataset contains 20000 samples, 16000 for training and 4000 for test. HODA
dataset contains 80000 samples of Farsi handwritten digits; 60000 for training and 20000 for

2 http://archive.ics.uci.edu/ml/datasets/Letter+Recognition.

123
182 H. Khosravi

1.8
Average of weight vectors between
Mean = 1.0217
1.6 Std = 0.1561
input and hidden layer (Wi)

1.4

1.2

0.8

0.6

0.4
0 20 40 60 80 100 120 140
Hidden Neuron
Fig. 4 Average values of new weight vectors between input and RBF layer of 128 hidden neurons

0.24
0.22 Weighted RBF
Basic RBF
0.2
0.18
0.16
RMS Error

0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 10 20 30 40 50 60 70 80
Epoch
Fig. 5 RMS Error for Basic RBF network (top) and WRBF network (bottom)

test. UCI dataset includes 16 features per each sample while, for HODA dataset, we extracted
64 features per each sample using average value of pixels in 64 windows.
To have a comparison with MLP, we also implemented classic MLP with momentum and
include it as well. The results to be fair, we ran each classifier several times and the following
results are the average recognition rates achieved for each network structure. RMS error is
used as stopping criteria and whenever this error did not change significantly in successive
epochs, training is stopped. Figure 5 shows the error plot for digit recognition problem in
the training process. The first advantage of WRBF over RBF is observed in this figure; RMS
error of WRBF classifier is drastically less than RBF classifier.
We tested each classifier with several numbers of hidden neurons to find the best choice
as well as comparing the ability of networks at different structures. Table 1 shows the results
of three classifiers for UCI dataset. In this problem, we see that WRBF outperforms both

123
A Novel Structure for Radial Basis Function Networks—WRBF 183

Table 1 Results of MLP, RBF


Hidden neurons MLP RBF WRBF Best classifier
and WRBF classifiers on UCI
letter dataset
16 65.00 68.58 77.1 WRBF
32 76.72 72.75 82.10 WRBF
64 82.05 76.38 85.63 WRBF
96 85.80 77.55 87.33 WRBF
128 86.78 81.13 88.48 WRBF
Number of classes = 26; feature
length = 16 200 89.25 83.13 92.78 WRBF

Table 2 Results of MLP, RBF


Hidden neurons MLP RBF WRBF Best classifier
and WRBF classifiers on HODA
digit dataset
16 95.85 92.49 95.63 MLP
32 96.92 94.23 96.84 MLP
64 97.22 96.00 97.50 WRBF
96 97.17 96.50 97.66 WRBF
128 97.20 96.94 97.81 WRBF
Number of classes = 10; feature
length = 64 200 97.63 97.14 97.94 WRBF

classic RBF and MLP networks in all structures with a significant difference. Comparing
with traditional RBF we have about 10% improvement.
In the case of HODA dataset, which can be seen as a simple classification problem, as
illustrated in Table 2, again WRBF performs better than RBF. Comparing with MLP, when
the number of hidden neurons is small, MLP generates superior results but the best result of
WRBF, 97.94%, is still better than MLP as well.
Here, it may raise a question considering Tables 1 and 2: Why the role of MLP and WRBF
interchanged in the first rows of the Table 2? In fact, there is no theoretical explanation for
this, since the structure of neural nets is developed to get rid of manual interpretation and
solutions! Even so, some general expressions may help us finding the relation. As we dis-
cussed before, for a small number of neurons, RBF may not perform as well as MLP and this
is evident in all experiments. It is due to the nature of RBF since it makes a couple of Gaussian
functions around data (Fig. 6b) and if the distribution of the samples is not semi-Gaussian
(like Fig. 6c) lots of neurons will be required to cover the input space (Fig. 6e, f) while in
MLP, some lines or hyperspaces could separate classes (Fig. 6d).
The previous statement makes sense about traditional RBF and MLP, but WRBF is a
modified structure of RBF in which inputs were mapped before going through Gaussian
functions; so it cannot be interpreted easily as RBF or MLP. By now, we may think about
WRBF as a nonlinear version (since it assigns different weights to the inputs) of RBF, which
makes it perform better, but exact description about its behavior may require further efforts.
In general, it is clear from our experiments that WRBF always outperforms traditional
RBF and can be used as an improved version of RBF. Furthermore, if the number of hidden
neurons is reasonable, WRBF will generate better results than MLP.

5 Conclusion

In this paper, a new structure for RBF network, named WRBF, was proposed. In this network,
the connections between input and hidden neurons were allowed to have some weights around

123
184 H. Khosravi

Fig. 6 a Simple classification problem solved by MLP. b Similar problem solved with RBF. c Samples with
non-Gaussian distribution. d MLP solution for non-Gaussian samples. e RBF solution with few hidden neurons
(not a success) f RBF solution with large number of neurons

unity. Experiments showed that these weights which are typically between 0.6 and 1.4 make
significant improvements in pattern recognition problems. A gradient descent algorithm was
proposed for adjustment of the new weights. Two classification problems used to evaluate
performance of WRBF network: letter classification and digit recognition. In both problems,
WRBF outperformed classic RBF and MLP network. For example in the case of 200 hid-
den neurons, WRBF achieved recognition rate of 92.78% on UCI dataset while RBF and
MLP achieved 83.13 and 89.25% respectively. On HODA dataset, WRBF reached 97.94%
recognition rate while RBF and MLP achieved 97.14 and 97.63% respectively.

Appendix A: Weight Adjustment of WRBF

We use sum of square error as the target criteria which must be minimized:

E= (t j − z j )2 (8)
j

where t j is jth element of target vector and z j is the jth element of output vector:
1
zj = (9)
1 + e−s j
Here s j is weighted sum of nodes coming to jth output neuron as follows:


M
sj = u m j ym (10)
m=1

Here, u m j is the weight between mth hidden neuron and jth output neuron and ym is the
output of mth hidden neuron as in Eq. 2.

123
A Novel Structure for Radial Basis Function Networks—WRBF 185

To find adjustment value, we should compute gradient of Error (Eq. 8 versus Wim (weight
between ith input neuron and mth hidden neuron):
∂E  ∂E ∂z j ∂ ym
= × × (11)
∂wim ∂z j ∂ ym ∂wim
j

Now we compute these three parts:


According to Eq. 8:
∂E
= −2(t j − z j ) (12)
∂z j
According to Eqs. 9 and 10:
∂z j ∂z j ∂s j
= × = z j (1 − z j ) × u m j (13)
∂ ym ∂s j ∂ ym
And finally according to Eq. 2 we have:
 
∂ ym wim xi − μim
= −ym xi (14)
∂wim σm2
Combining Eqs. 12–14 we have error gradient versus Wim :
 
∂E wim xi − μim 
= −ym xi −2(t j − z j ) × z j (1 − z j ) × u m j (15)
∂wim σm2
j

Now we can update new weights according to the following equation:


∂E
wim = wim − η (16)
∂wim
Here η is the learning rate which can be constant, e.g. 0.05, or descend monotonically.

References

1. Knudsen EI (1994) Supervised Learning in the Brain. J Neurosci 14(7):3985–3997


2. Maass W (1997) Networks of spiking neurons: the third generation of neural network models. Neural
Netw 10:1659–1671
3. Izhikevich EM (2004) Which model to use for cortical spiking neuron?. IEEE Trans Neural Netw
15(5):1063–1070
4. da Silva AB, Rosa JLG (2011) Advances on criteria for biological plausibility in artificial neural networks:
think of learning processes. In: International joint conference on neural networks, California
5. Kohonen T (1988) An introduction to neural computing. Neural Netw 1:4
6. Huang SH, Zhang H-C (1994) Artificial neural networks in manufacturing: concepts, applications, and
perspectives. IEEE Trans Compon Packag Manuf Technol 17(2):212–228
7. Balabin RM, Safieva RZ (2008) Motor oil classification by base stock and viscosity based on near infrared
(NIR) spectroscopy data. Fuel 87:2745–2752
8. Khosravi H, Kabir E (2010) Farsi font recognition based on Sobel–Roberts features. Pattern Recogn Lett
31:75–82
9. Xiaohu L et al (2011) A new multilayer feedforward small-world neural network with its performances
on function approximation. In: IEEE Int’l Conf. on computer science and automation engineering,
pp 353–357
10. Balabin RM, Lomakina EI (2009) Neural network approach to quantum-chemistry data: accurate predic-
tion of density functional theory energies. J Chem Phys 131(7):1041–1048
11. Bashkirov OA, Braverman EM, Muchnik IB (1964) Potential function algorithms networks for pattern
recognition learning machines. Autom Remote Control, 629–631

123
186 H. Khosravi

12. Hojjatoleslami A, Sardo L, Kittler J (1997) An RBF based classifier for the detection of microcalcifica-
tions in mammograms with outlier rejection capability In: International conference on neural networks,
pp 379–1384
13. Wang D et al (2002) Protein sequences classification using radial basis function (RBF) neural networks.
In: 9’th international conference on neural information processing, Singapore, pp 764–769
14. Chen S et al (2008) Symmetric RBF classifier for nonlinear detection in multiple-antenna-aided systems.
IEEE Tran Neural Netw 19(5):737–745
15. Meng K et al (2010) A self-adaptive RBF neural network classifier for transformer fault analysis. IEEE
Trans Power Syst 25(3):1350–1360
16. Vakil-Baghmisheh M-T, Pavesic N (2004) Training RBF networks with selective backpropagation.
Neurocomputing 62:39–64
17. King JL, Reznik L (2006) Topology selection for signal change detection in sensor networks: RBF vs.
MLP. In: International joint conference on neural networks, pp 2529–2535
18. Polat G, Altun H (2007) Evalutation of performance of KNN, MLP and RBF classifiers in emotion
detection problem. In: 15th conference on signal processing and communications applications, vol 1,
pp 1–4
19. Haykin S (2009) Neural networks and learning machines, 3rd edn. Prentice-Hall, Upper Saddle River
20. Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioral science.
PhD thesis, Harvard University
21. Parker DB (1982) Learning logic. Invention Report S81-64, File 1, Oce of Technology Licensing. Stanford
University
22. Ciocoiu IB (2002) RBF networks training using a dual extended Kalman filter. Neurocomputing 48:
609–622
23. Abe Y, Iiguni Y (2006) Fast computation of RBF coefficients using FFT. Sig Process 86(11):3264–3274
24. Ho K, Leung C-s, Sum J (2011) Training RBF network to tolerate single node fault. Neurocomputing
74(6):1046–1052
25. Sanchez AD (2002) Searching for a solution to the automatic RBF network design problem. Neurocom-
puting 42:147–170
26. Zhang R et al (2007) Improved GAP-RBF network for classification problems. Neurocomputing 70(16–
18):3011–3018
27. Tao J, Wang N (2007) Splicing system based genetic algorithms for developing rbf networks models.
Chin J Chem Eng 15(2):240–246
28. Sug H (2010) Generating better radial basis function network for large data set of census. Int J Softw Eng
Appl 4(2):15–22
29. Zhang Q, Benveniste A (1992) Wavelet networks. IEEE Trans Neural Netw 3(6):889–898
30. Balabin RM, Safieva RZ, Lomakina EI (2008) Wavelet neural network (WNN) approach for calibration
model building based on gasoline near infrared (NIR) spectra. Chemometr Intell Lab Syst 93:58–62
31. Khosravi H, Kabir E (2007) Introducing a very large dataset of handwritten Farsi digits and a study on
their varieties. Pattern Recogn Lett 28(10):1133–1141

123

You might also like