Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Swarm and Evolutionary Computation 1 (2011) 164–171

Contents lists available at SciVerse ScienceDirect

Swarm and Evolutionary Computation


journal homepage: www.elsevier.com/locate/swevo

Regular paper

Clustering using firefly algorithm: Performance study


J. Senthilnath, S.N. Omkar ∗ , V. Mani
Department of Aerospace Engineering, Indian Institute of Science, Bangalore, India

article info abstract


Article history: A Firefly Algorithm (FA) is a recent nature inspired optimization algorithm, that simulates the flash pattern
Received 10 February 2011 and characteristics of fireflies. Clustering is a popular data analysis technique to identify homogeneous
Received in revised form groups of objects based on the values of their attributes. In this paper, the FA is used for clustering
5 May 2011
on benchmark problems and the performance of the FA is compared with other two nature inspired
Accepted 2 June 2011
Available online 30 June 2011
techniques — Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), and other nine methods used
in the literature. Thirteen typical benchmark data sets from the UCI machine learning repository are used
Keywords:
to demonstrate the results of the techniques. From the results obtained, we compare the performance of
Clustering the FA algorithm and conclude that the FA can be efficiently used for clustering.
Classification Crown Copyright © 2011 Published by Elsevier Ltd. All rights reserved.
Firefly algorithm

1. Introduction drawback of k-means algorithm is that it converges to a local


minima from the starting position of the search [10]. In order to
Clustering is an important unsupervised classification tech- overcome local optima problems, many nature inspired algorithms
nique, where a set of patterns, usually vectors in a multi- such as, genetic algorithm [11], ant colony optimization [12],
dimensional space, are grouped into clusters (or groups) based on artificial immune system [13], artificial bee colony [9], and particle
some similarity metric [1–4]. Clustering is often used for a vari- swarm optimization [14] have been used. Recently, efficient
ety of applications in statistical data analysis, image analysis, data hybrid evolutionary optimization algorithms based on combining
mining and other fields of science and engineering. evolutionary methods and k-means to overcome local optima
Clustering algorithms can be classified into two categories: problems in clustering are used [15–17].
hierarchical clustering and partitional clustering [5,6]. Hierarchical The Firefly Algorithm (FA) is a recent nature inspired tech-
clustering constructs a hierarchy of clusters by splitting a large nique [18], that has been used for solving nonlinear optimization
cluster into smaller ones and merging smaller cluster into their problems. This algorithm is based on the behavior of social insects
nearest centroid [7]. In this, there are two main approaches: (fireflies). In social insect colonies, each individual seems to have
(i) the divisive approach, which splits a larger cluster into two or its own agenda and yet the group as a whole appears to be highly
more smaller ones; (ii) the agglomerative approach, which builds organized. Algorithms based on nature have been demonstrated
a larger cluster by merging two or more smaller clusters. On to show effectiveness and efficiency to solve difficult optimization
the other hand partitional clustering [8,9] attempts to divide the problems. A swarm is a group of multi-agent systems such as fire-
data set into a set of disjoint clusters without the hierarchical flies, in which simple agents coordinate their activities to solve the
structure. The most widely used partitional clustering algorithms complex problem of the allocation of communication to multiple
are the prototype-based clustering algorithms where each cluster forage sites in dynamic environments.
is represented by its center. The objective function (a square In this study, the Firefly Algorithm (FA), which is described
error function) is the sum of the distance from the pattern to by Yang [18] for numerical optimization problems, is applied
the center [6]. In this paper we are concerned with partitional to clustering. To study the performance of the FA to clustering
clustering for generating cluster centers and further using these problems, we consider the standard benchmark problems (13
cluster centers to classify the data set. typical test databases) that are available in the literature [9,14].
A popular partitional clustering algorithm—k-means clustering, The performance of the FA algorithm on clustering is compared
is essentially a function minimization technique, where the with the results of other nature inspired techniques—Artificial
objective function is the squared error. However, the main Bee Colony (ABC) [19] and Particle Swarm Intelligence (PSO) [20]
algorithm on the same test data sets [9,14]. The FA, ABC
and PSO algorithms are in the same class of population-based,
∗ Corresponding author. nature inspired optimization techniques. Hence, we compare the
E-mail address: omkar@aero.iisc.ernet.in (S.N. Omkar). performance of the FA algorithm with ABC and PSO algorithms.
2210-6502/$ – see front matter Crown Copyright © 2011 Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.swevo.2011.06.003
J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171 165

We also present the results of other nine methods used in the attractiveness β is relative, it will vary with the distance rij between
literature [9,14]. For the ease of understanding and comparison, two fireflies i and j at locations xi and xj respectively, is given as
we follow the same manner of analysis and discussions, used in [9].
rij = ‖xi − xj ‖. (2)
The only key difference is the use of the FA algorithm in this study.
Contribution of this paper: In this work, for a given data set the FA The attractiveness function β(r ) of the firefly is determined by
is used to find the cluster centers. The cluster centers are obtained 2
by randomly selecting 75% of the given data set. This 75% of the β(r ) = β0 e−γ r (3)
given data set, we call as a training set. The FA algorithm uses this where β0 is the attractiveness at r = 0 and γ is the light absorption
training set and the cluster centers are obtained. In order to study, coefficient.
the performance of the FA algorithm, the remaining 25% of data The movement of a firefly i at location xi attracted to another
set is used (called test data set). The performance measure used more attractive (brighter) firefly j at location xj is determined by
in the FA is the classification error percentage (CEP). This CEP is
2
defined as the ratio of number of misclassified samples in the test xi (t + 1) = xi (t ) + β0 e−γ r (xj − xi ). (4)
data set and total number of samples in the test data set. This can
A detailed description of this FA is given in [18]. A pseudo-code of
be done because in the test data set, we know the actual class of the
this algorithm is given below.
test data. The distances between the given test data and the cluster
centers are computed. The data is assigned to the cluster center
(class) that has the minimum distance. Hence, we can compute the Pseudo-code: A High-Level Description of firefly algorithm
performance measure—classification error percentage (CEP). Input:
The paper is organized as the implementation of the FA Create an initial population of fireflies n within
algorithm in Section 2, clustering using the FA and performance d-dimensional search space xik , i = 1, 2, . . . , n and
evaluation in Sections 3 and 4 respectively, and then results k = 1, 2, . . . , d
presented and discussed in Section 5. We conclude the paper in Evaluate the fitness of the population f (xik ) which is directly
Section 6 by summarizing the observations. proportional to light intensity Iik
Algorithm’s parameter—β0 , γ
2. Firefly algorithm Output:
Obtained minimum location: ximin
begin
Fireflies are glowworms that glow through bioluminescence.
repeat
For simplicity in describing our firefly algorithm, we now use the
for i = 1 to n
following three idealized rules: (i) all fireflies are unisex so that
for j = 1 to n
one firefly will be attracted to other fireflies regardless of their
if (Ij < Ii )
sex; (ii) an important and interesting behavior of fireflies is to
Move firefly i toward j in
glow brighter mainly to attract prey and to share food with others;
d-dimension using Eq. (4)
(iii) attractiveness is proportional to their brightness, thus each
end if
agent firstly moves toward a neighbor that glows brighter [21].
Attractiveness varies with distance r via
The Firefly Algorithm (FA) [18] is a population-based algorithm
exp[−r 2 ]
to find the global optima of objective functions based on swarm
Evaluate new solutions and update light
intelligence, investigating the foraging behavior of fireflies. In the
intensity using Eq. (1)
FA, physical entities (agents or fireflies) are randomly distributed end for j
in the search space. Agents are thought of as fireflies that carry a end for i
luminescence quality, called luciferin, that emit light proportional Rank the fireflies and find the current best
to this value. Each firefly is attracted by the brighter glow of other until stop condition true
neighboring fireflies. The attractiveness decreases as their distance end
increases. If there is no brighter one than a particular firefly, it
will move randomly. In the application of the FA to clustering,
the decision variables are cluster centers. The objective function
3. Clustering using FA
is related to the sum on all training set instances of Euclidean
distance in an N-dimensional space, as given in [9].
The clustering methods, separating the objects into groups or
Based on this objective function, initially, all the agents
classes, are developed based on unsupervised learning. In the
(fireflies) are randomly dispersed across the search space. The two
unsupervised technique, the training data set are grouped first,
phases of the firefly algorithm are as follows.
based solely on the numerical information in the data (i.e. cluster
i. Variation of light intensity: Light intensity is related to centers), and are then matched by the analyst to information
objective values [18]. So for a maximization/minimization problem classes. The data sets that we tackled contain the information of
a firefly with high/low intensity will attract another firefly with classes for each data. Therefore, the main goal is to find the centers
high/low intensity. Assume that there exists a swarm of n agents of the clusters by minimizing the objective function, the sum of
(fireflies) and xi represents a solution for a firefly i, whereas f (xi ) distances of the patterns to their centers.
denotes its fitness value. Here the brightness I of a firefly is selected For a given N objects the problem is to minimize the sum of
to reflect its current position x of its fitness value f (x) [18]. squared Euclidean distances between each pattern and allocate
each pattern to one of k cluster centers. The clustering objective
Ii = f (xi ), 1 ≤ i ≤ n. (1)
function is the sum of error squared as given in Eq. (5) is described
ii. Movement toward attractive firefly: A firefly attractiveness is as in [22]:
proportional to the light intensity seen by adjacent fireflies [18]. K −
Each firefly has its distinctive attractiveness β which implies how

J (K ) = (xi − ck ) (5)
strong it attracts other members of the swarm. However, the k=1 i∈ck
166 J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

where K is the number of clusters, for a given n pattern xi (i = 20


Class 2
1, . . . , n) the location of the ith pattern and ck (k = 1, . . . , K ) is
the kth cluster center, to be found by Eq. (6):
− xi
ck = (6) 15
i∈Ck
nk

where nk is the number of patterns in the kth cluster.

y
training data
The cluster analysis forms the assignment of data set into testing data
Class 1
clusters so that it can be grouped into same cluster based on
10
some similarity measures [23]. Distance measurement is most
widely used for evaluating similarities between patterns. The
cluster centers are the decision variables which are obtained by
minimizing the sum of Euclidean distance on all training set
instances in the d-dimensional space between generic instance xi 5
and the center of the cluster ck . The cost (objective) function for the 0 5 10 15 20 25
x
pattern i is given by Eq. (7), as in [9,14]
DTrain Fig. 1. Data distribution.
1 − CLknown (xj )
fi = d(xj , pi ) (7)
DTrain j=1 4.1.2. Classification efficiency
where DTrain is the number of training data set which is used to Classification efficiency is obtained using both the training and
normalize the sum that will range any distance within [0.0, 1.0] test data. The classification matrix is used to obtain the statistical
CLknown (xj ) measures for the class-level performance (individual efficiency)
and pi defines the class that instance belongs to according and the global performance (average and overall efficiency)
to database. of the classifier [24]. The individual efficiency is indicated by
Note that in our FA algorithm, the decision variables are the the percentage classification which tells us how many samples
cluster centers. The objective function in our FA algorithm is given belonging to a particular class have been correctly classified. The
by Eq. (7). In our study, we consider the standard 13 benchmark percentage classification (ηi ) for the class ci is given by Eq. (9).
problems given in [14]. For a given data set, let n be the number of
data points, d be the dimension, c be the number of classes. A given qii
ηi = n
(9)
data point belongs to only one of these c classes. Of the given data ∑
set, 75% of the data set are randomly selected to obtain the cluster qji
j =1
centers using Eq. (7). In this way we obtain the cluster centers
for all the c classes. The remaining 25% of data set is used (called where qii is the number of correctly classified samples and n
test data set) to obtain the classification error percentage (CEP). is the number of samples for the class ci in the data set. The
An illustrative example of this FA algorithm and its performance global performance measures are the average (ηa ) and overall (ηo )
measure, is given in the next section. classification, which are defined as
nc
1 −
4. Performance measures and an illustrative example ηa = ηi (10)
nc i=1
As discussed in the earlier section, the training data sets are nc
used in the firefly algorithm to extract knowledge of each class in 1 −
ηo = qii (11)
the form of cluster centers. Using these cluster centers, the testing N i =1
data set are classified and the performance of classification are
analyzed. where nc is the total number of classes and N is the number of
patterns.
4.1. Performance evaluation
4.2. Illustrative example
The performance of the extracted knowledge in the form of
cluster centers by the FA is evaluated using Classification Error We illustrate how the Firefly Algorithm (FA) is used for
Percentage (CEP) and classification efficiency. CEP depends only on clustering with the following synthetic data. Although the
test data and the classification efficiency depends on both training proposed algorithm can be used for any type of mixture model,
and testing data. we focus on a Gaussian mixture. Let us consider two Gaussian
mixtures that have two input features, namely x and y. Here, the
mean values µ1 = [8, 8]T and µ2 = [16, 16]T , co-variance
4.1.1. Classification Error Percentage (CEP)
matrix (x, y) = {(6, 3); (3, 2)} are assumed and each class have
CEP is obtained only using the test data [9]. For each problem,
equal number of samples. In our experimentation 100 samples are
we report the CEP which is the percentage of incorrectly classified
generated randomly for each class. Of these 75 data points are used
patterns of the test data sets as given in [9], to make a reliable
for training and the remaining 25 is used for testing in each class.
comparison.
This synthetic data generated is shown in Fig. 1.
The classification of each pattern is done by assigning it to
We use the firefly algorithm on training data to obtain cluster
the class whose distance is closest to the center of the clusters.
centers. Let xi be one of the solutions (cluster centers) and Ji be the
Then, the classified output is compared with the desired output
objective function value for this cluster center.
and if they are not exactly the same, the pattern is separated as
We consider a population size of 5 fireflies at locations x1 ,
misclassified [9]. This procedure is applied to all test data and the
x2 , x3 , x4 and x5 within 2d-dimensional search space. Now evaluate
total misclassified pattern number is percentaged to the size of test
the fitness of the population J1 , J2 , J3 , J4 , and J5 using Eq. (7) which
data set, which is given by
is directly proportional to light intensity I1 , I2 , I3 , I4 and I5 . Now
number of misclassified samples compare the intensity values of a firefly, if (I2 < I1 ) then move
CEP = × 100. (8)
firefly 2 toward 1 using Eq. (4), similarly compare all the agents
total size of test data set
J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171 167

18 Table 1
Properties of the problems.

16 Data Train Test Input Class

Balance 625 469 156 4 3


Cancer 569 427 142 30 2
14 Cancer-Int 699 524 175 9 2
Credit 690 518 172 15 2
Agent Dermatology 366 274 92 34 6
12
y

Agent movement Diabetes 768 576 192 8 2


E.Coli 327 245 82 7 5
Cluster center
Glass 214 161 53 9 6
10 Heart 303 227 76 35 2
Horse 364 273 91 58 3
Iris 150 112 38 4 3
8 Thyroid 215 162 53 5 3
Wine 178 133 45 13 3

6
4 6 8 10 12 14 16 18 20
x set contains 569 patterns with 30 attributes and the Cancer-Int
contains 699 patterns, 9 attributes.
Fig. 2. Optimal cluster centers. Data set 4: The Credit data set is based on the Australian credit
card to assess applications for credit cards. There are 690 patterns
and update the movement phase of each agent by evaluating new (number of applicants), 15 input features and the output has 2
solutions and update light intensity using Eq. (1). This procedure is classes.
continued till it converges to the optimal cluster center i.e. ximin , as Data set 5: The Dermatology data set is based on differential
shown in Fig. 2. The cluster centers generated are diagnosis of erythemato-squamous diseases. There are 6 classes,
366 samples, and 34 attributes.
(x, y) = {(7.2233, 7.6659); (16.0580, 16.3230)}. Data set 6: The Pima—Diabetes data set has 768 instances of 8
The classification result using the testing data set of each class attributes and two classes which are to determine if the detection
centers found by the firefly algorithm has zero classification error of diabetes is positive (class A) or negative (class B).
percentage. For the entire data set, the performance of individual, Data set 7: The Escherichia coli data set is based on the cellular
average and overall efficiency is 100%. localization sites of proteins. Here the original data set has 336
patterns formed of 8 classes, but 3 classes are represented with
only 2, 2 and 5 number of patterns. Therefore, these 9 examples are
5. Results and discussion
omitted by considering 327 patterns, 5 classes and 7 attributes.
Data set 8: The Glass data set is defined in terms of their
In this work, we present the results obtained using the Firefly
oxide content as glass type. Nine inputs are based on 9 chemical
Algorithm (FA) on 13 typical benchmark data sets which are well
measurements with one of 6 types of glass. The data set contains
known in the literature (UCI database repository [25]). First, we
214 patterns which are split into 161 for training and 53 for testing.
describe the characteristics of the standard classification data set.
Data set 9: The Heart data set is based on the diagnosis of heart
Next we present the results obtained from the FA for 13 benchmark
disease. It contains 76 attributes for each pattern, 35 of which are
data set problems. Finally we present the comparison of the FA
used as input features. The data is based on Cleveland Heart data
with other two nature inspired techniques—Artificial Bee Colony
from the repository with 303 patterns and 2 classes.
(ABC) and Particle Swarm Optimization (PSO) and other 9 methods
Data set 10: The Horse data set is used to predict the fortune of
used in the literature [9,14] and analyze their performance.
a horse with a colic and to classify whether the horse will die, will
survive, or will be euthanized. The data set contains 364 patterns,
5.1. Data set description each of which has 58 inputs from 27 attributes and 3 classes.
Data set 11: The Iris data set consists of three varieties of
The 13 classification data set is a well-known and well-used flowers—setosa, virginica and versicolor. There are 150 instances
benchmark data set by the machine learning community. The and 4 attributes that make up the 3 classes.
number of data sets, the number of input features and the number Data set 12: The Thyroid data set is based on the diagnosis of
of classes are presented in Table 1. These 13 benchmark problems thyroid whether it is hyper or hypofunction. The data set contains
are chosen exactly the same as in [9,14], to make a reliable 215 patterns, 5 attributes and 3 classes.
comparison. The entire data set is segregated into two parts, the Data set 13: The Wine data obtained from the chemical analysis
75% of data is used for training purpose and the remaining 25% of wines were derived from three different cultivators. The data set
of data is used as testing samples. The number of the training contains 3 types of wines, with 178 patterns and 13 attributes.
and test sets can be found in Table 1. After training, we obtain
the cluster centers (extracted knowledge) that can be used for
5.2. Results obtained using firefly algorithm
classifying the test data set. The problems considered in this work
can be described briefly as follows.
In this section, we discuss the results obtained using the Firefly
Data set 1: The Balance data set is based on balance scale weight
Algorithm (FA) on 13 benchmark data set problems and compare
and distance. It contains 625 patterns which are split into 429 for
the FA with other 11 methods used in the literature based on the
training and 156 for testing. Their are 4 integer valued attributes
performance measures.
and 3 classes.
Data set 2 and 3: The Cancer and Cancer-Int data set is based on
the diagnosis of ‘‘breast cancer Wisconsin—Diagnostic’’ and ‘‘breast 5.2.1. FA clustering and parameter setting
cancer Wisconsin—Original’’ data sets respectively. It contains 2 The fireflies are initialized randomly in the search space. The
classes with a tumor as either benign or malignant. A cancer data parameter values used in our algorithm are
168 J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

Table 2 error of the classifiers on each problem are also given in the
Average classification error percentages using nature inspired techniques on test parenthesis. At a glance, one can easily see that the FA gets the
data sets.
best solution in 8 of the problems among 13 problems used. To
FA ABC PSO
be able to make a good comparison of all the algorithms, Tables 4
Balance 14.1 15.38 25.47 and 5 are reported. Table 4 shows the average classification errors
Cancer 1.06 2.81 5.8 of all problems and the ranking is based on the ascending order
Cancer-Int 0 0 2.87
Credit 12.79 13.37 22.96
of average classification error. Table 5 shows the sum of the
Dermatology 5.43 5.43 5.76 algorithms ranking of each problem in ascending order.
Diabetes 21.88 22.39 22.5 From Table 4, we can observe that based on average CEP
E.Coli 8.54 13.41 14.63 values, FA is the best in comparison with that of MLP artificial
Glass 37.74 41.5 39.05
neural network technique and ABC, while MLP performed better
Heart 13.16 14.47 17.46
Horse 32.97 38.26 40.98 in comparison with the ABC. However, even if the results in the
Iris 0 0 2.63 table are comparable, we believe that it may cause some significant
Thyroid 0 3.77 5.55 points to be disregarded since the distribution of the error rates are
Wine 0 0 2.22 not proportional. Therefore, the general ranking of the techniques
in Table 5 is realized by calculating the sum of the ranks of each
Number of fireflies (N ) = 20 problem from Table 4. From this ranking, once again the FA is
Attractiveness (β0 ) = 1 the best, while the ABC algorithm at the second position, and the
Light absorption coefficient (γ ) = 1 BayesNet technique at the third position. The classification error
Number of generations (T ) = 100. rate and rankings from the tables show that clustering with the
FA offers superior generalization capability. Note that, here we are
For most of the applications, the same parameter values are using only the results of all the other methods given in earlier
suggested by Yang [18]. After the fireflies are deployed randomly studies [9,14] except the FA algorithm.
with in the search space, the parameter β0 = 1 which is equivalent
to the scheme of cooperative local search with the brightest firefly
5.2.3. Analysis of classification efficiency using FA
strongly determined the other fireflies positions, especially in
its neighborhood. The parameter value of γ = 1 determines In the previous section, we presented the result obtained using
the variation of light intensity with increasing distance from the CEP. This (CEP) alone does not direct how far the algorithm is
communicated firefly, results in the complete random search. efficient. To analyze any classifier it is always important to check
The number of function evaluations in the firefly algorithm can the individual classification efficiency for testing sample and also
be obtained as follows: let N be the size of initial population, and average and overall efficiency for complete data set. Using the same
T be the maximum number of generation. Then the number of cluster centers the average and overall efficiency for entire data set
N ∗(N −1) are obtained.
function evaluations for each iteration is 2
. The total number
N ∗(N −1)
i. Significance of individual efficiency: For a testing data set,
of function evaluations generated is 2
× T . In our studies, the main significance of individual efficiency is to analyze the
we have used 100 as the maximum number of generations. The class-level performance of a classifier algorithm. From Table 6,
number of function evaluation for each 13 classification data set, we can observe that the individual classification efficiency of
(with N = 20 and T = 100) in one simulation run is 19 000. the testing samples, here Cancer-Int, Iris, Thyroid and Wine is
getting classified without any misclassifications and hence has an
5.2.2. Analysis of Classification Error Percentage using FA individual efficiency of 100%. In the case of Balance data set the
In [9,14], the Classification Error Percentage (CEP) measure individual efficiency of Class 2 is 66.7%. In Credit and Diabetes data
is used with all the 13 benchmark data sets. Falco et al. [14] set class 1 is misclassified as class 2 with individual efficiency of
compared the performance of the PSO algorithm with the other 9 70% where as in Heart and Dermatology data set Class 2 has less
methods namely Bayes Net [26], MultiLayer Perceptron Artificial individual efficiency of 73.1% and 50% respectively.
Neural Network (MLP) [27], Radial Basis Function Artificial Neural Form Table 3 we can observe that, for the classification
Network (RBF) [28], KStar [29], Bagging [30], MultiBoostAB [31], problem—Heart (2 class problem), the FA performed better than all
Naive Bayes Tree (NBTree) [32], Ripple Down Rule (Ridor) [33] the other classifier with the CEP value 13.16. This does not mean
and Voting Feature Interval (VFI) [34]. Karaboga and Ozturk [9] that the individual efficiency of each class to be good. To illustrate
implemented the ABC algorithm and analyzed CEP with all the this in more detail, let us consider Heart data set, from Table 6 we
above mentioned methods. In this study, in addition to these can observe that the Class 1 has impressive individual efficiency of
methods [9,14] we have analyzed the CEP measure of the FA to 94% whereas in Class 2 most of the samples belonging to Class 2
make reliable comparison. is misclassified as Class 1 with individual efficiency 73.1%. Hence
From the training data set the knowledge in the form of cluster it is important to consider the individual efficiency to analyze the
centers is obtained using the Firefly Algorithm (FA). For these class-level performance of a clustering algorithm.
cluster centers the testing data sets are applied and the CEP values ii. Performance of average and overall efficiency: For entire data
are obtained. The results of the nature inspired techniques—FA, set, it is always necessary to know the global performance of
ABC and PSO for the problems are given in Table 2 where CEP values a algorithm. This can be achieved by using average and overall
are presented. The FA outperforms the ABC and PSO algorithms efficiency. As we can notice from Table 6, for the entire data set
in all 13 problems, whereas ABC algorithm’s result is better than the average and overall efficiency using the firefly algorithm for
that of PSO algorithm in all 12 problems except for one problem 13 benchmark data sets. The Balance data set has an average
(the glass problem) in terms of classification error. Moreover, and overall efficiency of 74.9% and 80.8% respectively where as
the average classification error percentages is also better for all Cancer data set with average and overall efficiency of 91% and
problems in the case of FA (11.36%) comparing to that of ABC 92.5% respectively. An average and overall efficiency of Cancer-
(13.13%) and PSO (15.99%). Int, Dermatology, Iris and Wine data set are 97.9%, 81.9%, 94.7%
From Table 3, we can observe that the CEP measure of the FA and 90.6% respectively. The average efficiency of Credit, Diabetes,
and 11 methods that are given in [9,14] are presented, and the E.Coli, Heart and Thyroid are 75.5%, 73.4%, 88.5%, 77.1% and 92.6%.
ranking is based on the ascending order of average classification The overall efficiency of Credit, Diabetes, E.Coli, Heart and Thyroid
J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171 169

Table 3
Average classification error percentages and ranking of the techniques given in and the FA algorithm on each problem.
FA ABC PSO BayesNet MlpAnn RBF KStar Bagging MultiBoost NBTree Ridor VFI

Balance 14.1 15.38 25.47 19.74 9.29 33.61 10.25 14.77 24.20 19.74 20.63 38.85
(3) (5) (10) (6) (1) (11) (2) (4) (9) (6) (8) (12)
Cancer 1.06 2.81 5.80 4.19 2.93 20.27 2.44 4.47 5.59 7.69 6.36 7.34
(1) (3) (8) (5) (4) (12) (2) (6) (7) (11) (9) (10)
Cancer-Int 0 0 2.87 3.42 5.25 8.17 4.57 3.93 5.14 5.71 5.48 5.71
(1) (1) (3) (4) (8) (12) (6) (5) (7) (10) (9) (10)
Credit 12.79 13.37 22.96 12.13 13.81 43.29 19.18 10.68 12.71 16.18 12.65 16.47
(5) (6) (11) (2) (7) (12) (10) (1) (4) (8) (3) (9)
Dermatology 5.43 5.43 5.76 1.08 3.26 34.66 4.66 3.47 53.26 1.08 7.92 7.60
(6) (6) (8) (1) (3) (11) (5) (4) (12) (1) (10) (9)
Diabetes 21.88 22.39 22.50 25.52 29.16 39.16 34.05 26.87 27.08 25.52 29.31 34.37
(1) (2) (3) (4) (8) (12) (10) (6) (7) (4) (9) (11)
E.Coli 8.54 13.41 14.63 17.07 13.53 24.38 18.29 15.36 31.70 20.73 17.07 17.07
(1) (2) (4) (6) (3) (11) (9) (5) (12) (10) (6) (6)
Glass 37.73 41.50 39.05 29.62 28.51 44.44 17.58 25.36 53.70 24.07 31.66 41.11
(7) (10) (8) (5) (4) (11) (1) (3) (12) (2) (6) (9)
Heart 13.16 14.47 17.46 18.42 19.46 45.25 26.70 20.25 18.42 22.36 22.89 18.42
(1) (2) (3) (4) (7) (12) (11) (8) (4) (9) (10) (4)
Horse 32.97 38.26 40.98 30.76 32.19 38.46 35.71 30.32 38.46 31.86 31.86 41.75
(6) (8) (11) (2) (5) (9) (7) (1) (9) (3) (3) (12)
Iris 0 0 2.63 2.63 0 9.99 0.52 0.26 2.63 2.63 0.52 0
(1) (1) (8) (8) (1) (12) (6) (5) (8) (8) (6) (1)
Thyroid 0 3.77 5.55 6.66 1.85 5.55 13.32 14.62 7.40 11.11 8.51 11.11
(1) (3) (4) (6) (2) (4) (11) (12) (7) (9) (8) (9)
Wine 0 0 2.22 0 1.33 2.88 3.99 2.66 17.77 2.22 5.10 5.77
(1) (1) (5) (1) (4) (8) (9) (7) (12) (5) (10) (11)

Table 4
Average classification error percentages and general ranking of the techniques on all problems.
FA ABC PSO BayesNet MlpAnn RBF KStar Bagging MultiBoost NBTree Ridor VFI

Average 11.36 13.13 15.99 13.17 12.35 26.93 14.71 13.3 22.92 14.68 15.38 18.89
Rank 1 3 9 4 2 12 6 5 11 7 8 10

Table 5
The sum of ranking of the techniques and general ranking based on the total ranking.
FA ABC PSO BayesNet MlpAnn RBF KStar Bagging MultiBoost NBTree Ridor VFI

Total 35 50 86 54 57 137 89 67 110 86 97 113


Rank 1 2 6 3 4 12 8 5 10 7 9 11

Table 6
FA best classification efficiency.
Balance Cancer Cancer-Int Credit Dermatology Diabetes E.Coli Glass Heart Horse Iris Thyroid Wine

η1 90.3 96.7 100 70 96.7 70 100 66.7 94 81.7 100 100 100
η2 66.7 100 100 92.4 50 82.8 80 40 73.1 40.9 100 100 100
η3 88.9 – – – 100 – 100 60 – 44.4 100 100 100
η4 – – – – 100 – 80 80 – – – – –
η5 – – – – 100 – 94.7 100 – – – – –
η6 – – – – 100 – – 90 – – – – –
ηa 74.9 91 97.9 75.5 81.9 73.4 88.5 70.8 77.1 53.7 94.7 92.6 90.6
ηo 80.8 92.5 97.9 78.7 81.9 75.9 89.2 61.6 78.7 66.1 94.7 94.2 90.6

are 78.7%, 75.9%, 89.2%, 78.7% and 94.2%. In the case of Glass in the same class of population-based, nature inspired optimiza-
and Horse data set the FA has a less average efficiency of 70.8% tion techniques. Here we compare the three nature inspired tech-
and 53.7% respectively and overall efficiency of 61.6% and 66.1% nique to extract knowledge in the form of cluster centers and the
respectively. performance is analyzed using classification efficiency. The cluster
centers generated using the FA, ABC and PSO for Iris training data
are shown in Table 7. Here the cluster centers obtained for ABC
5.2.4. Comparison of classification efficiency of nature inspired matches with the published literature [35]. The parameter value
technique using Iris data set used for the FA is as given in Section 5.2.1. For ABC and PSO we
Form Table 6 we can observe that, for the standard benchmark consider the parameter value as in [9,14] respectively.
problems—Cancer-Int, Iris, Thyroid and Wine their is no misclassi- From Table 8, we can observe in comparison with other nature
fication in any of their individual classes i.e. all the test data set are inspired techniques the optimal and mean fitness values obtained
classified correctly and hence CEP value is 0. This does not mean using the FA is better than ABC and PSO. Being a continuous
that the overall efficiency is 100%. To illustrate this in more de- optimization problem, initially every possible cluster centers
tail, let us consider Iris data set as compared to Cancer-Int, Thy- picked by the population is not the best optimal point in the search
roid and Wine, it has less input features. The FA, ABC and PSO are space. Therefore the selection of new cluster centers after fitness
170 J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

Table 7 set [36]. Earlier study showed that the proper selection of training
Optimal cluster center for Iris data set using nature inspired technique. data set improves the performance measure [36]. In our study, we
Nature inspired Feature A Feature B Feature C Feature D have selected 75% of training data set randomly and tabulated the
technique result based on the most favorable performance measure for the
5.0538 3.4813 1.4665 0.2895 selected training data set.
FA 5.9432 2.7627 4.4195 1.4106 In overall for most of the data set, the FA has good global
6.3929 2.9021 5.1321 2.3328 performance. We can claim that by looking at the accuracy and
5.0061 3.4171 1.4641 0.2466
robustness of FA, it can be used for clustering problems studied in
ABC 5.9026 2.7483 4.3937 1.4349
6.8501 3.0737 5.7433 2.0711 this paper.
5.0416 3.4433 1.4796 0.2475
PSO 6.0431 2.9738 4.327 1.3649 6. Discussions and conclusions
6.9410 2.9321 5.4664 1.388

This paper investigates a new nature inspired algorithm—


Table 8 the FA is used for clustering and evaluating its performance.
Results of nature inspired techniques after 20 runs for optimal cluster centers using The FA algorithm is compared with ABC and PSO as all these
Iris data set. methods are in the same class of population-based, nature inspired
Nature inspired Optimal Worst Average Standard deviation optimization techniques. As in other population-based algorithms,
technique
the performance of the FA depends on the population size, β , and
FA 0.4880 0.5928 0.4897 0.0087 γ . In the FA algorithm, a firefly particle move toward another firefly
ABC 0.4891 0.4957 0.4903 0.0018 particle, which has a better objective function value (fitness). The
PSO 0.4943 0.9115 0.5186 0.0668
distance moved by the firefly in each instance is given by the
distance between the two firefly particles (r). The effect of the
Table 9 values of β and γ are discussed in [18]. When the value of r is
Comparison of classification efficiency for Iris data set. large/small, the firefly will move a small/large distance. This will
Classification efficiency FA (%) ABC (%) PSO (%) affect the computation time of this algorithm. In PSO each particle
η1 100 100 100
will move a distance based on its personal best and the global best.
η2 96 96 92 In ABC each bee particle position will be compared twice with best
η3 88 72 74 particle position. In the FA algorithm only the distance is necessary
ηa 94.7 89.3 88.7 for the movement. The performance measure (CEP), will helps us to
ηo 94.7 89.3 88.7 examine which method has generated the optimal cluster centers.
The clustering task of 13 benchmark data sets are accomplished
successfully by the procedure of partitional clustering using a re-
evaluation is effectively scanned iteratively till all the particles
cent nature inspired technique—Firefly Algorithm (FA). Clustering
converge to an optimal result i.e. in the form of cluster centers.
is an important technique to identify homogeneous clusters (or
In the FA algorithm, a firefly particle moves toward another firefly
classes) such that the patterns for a cluster center share a high
particle, which has a better objective function value (fitness). The
degree of affinity while being very dissimilar for other clusters.
distance moved by the firefly in each instance is given by the
The performance of the FA using classification error percentage
distance between the two firefly particles (r). When the value of r
is compared with other two nature inspired techniques—Artificial
is large/small, the firefly will move a small/large distance. This will
Bee Colony (ABC) and Particle Swarm Optimization (PSO) and other
affect the computation time of this algorithm. In PSO each particle
nine methods which are widely used by the researchers. The per-
will move a distance based on its personal best and the global
formance measure using classification efficiency—individual, aver-
best. In ABC each bee particle position will be compared twice
age and overall efficiency of the FA is analyzed using 13 benchmark
with best particle position. Further the cluster centers generated by
problems. From the results obtained, we can conclude that the FA
these algorithm is analyzed using the distances between the given
is an efficient, reliable and robust method, which can be applied
data and the cluster centers are computed. The data is assigned successfully to generate optimal cluster centers.
to the cluster center (class) that has the minimum distance. The
performance measure will helps us to examine which method has
Acknowledgments
generated the optimal cluster centers.
The classification matrix for the entire Iris data set are shown
The authors would like to thank the reviewers for their
in Table 9. From this table, we can observe that, for all the nature
comments which were useful during the revision of this study.
inspired algorithms, samples belonging to Class 2 and Class 3 are
getting misclassified as Class 3 and Class 2 respectively. For the
References
FA and ABC generated optimal cluster centers, the performance
of individual efficiency of Class 2 is 96% where as PSO has 92%. [1] M.R. Anderberg, Cluster Analysis for Application, Academic Press, New York,
The individual efficiency of Class 3 using FA is 88% which is 1973.
better in comparison with that of ABC and PSO with 72% and 74% [2] J.A. Hartigan, Clustering Algorithms, Wiley, New York, 1975.
[3] P.A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, Prentice-
respectively. In all the algorithms, Class 1 is classified without any Hall, London, 1982.
misclassification and hence have individual efficiency of 100%, as [4] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Englewood
it is linearly separable in comparison with that of other 2 classes. Cliffs, 1988.
[5] H. Frigui, R. Krishnapuram, A robust competitive clustering algorithm with
Also, the average and overall efficiency is better in the case of applications in computer vision, IEEE Transactions on Pattern Analysis and
the FA with 94.7% in comparison to ABC and PSO is 89.3% and Machine Intelligence 21 (1999) 450–465.
88.7% respectively. Hence it is important to consider the individual, [6] Y. Leung, J. Zhang, Z. Xu, Clustering by scale-space filtering, IEEE Transactions
on Pattern Analysis and Machine Intelligence 22 (2000) 1396–1410.
average and overall efficiency in multi-class classification problem [7] D. Chris, Xiaofeng He, Cluster merging and splitting in hierarchical clustering
for a generated cluster centers. algorithms, in: Proc. IEEE ICDM, 2002, pp. 1–8.
It is important to note that the performance of clustering mainly [8] B. Mirkin, Mathematical Classification and Clustering, Kluwer Academic
Publishers, Dordrecht, 1996.
depends on the size and quality of training data set. There are some [9] D. Karaboga, C. Ozturk, A novel cluster approach: Artificial Bee Colony (ABC)
methods available in the literature for the selection of training data algorithm, Applied Soft Computing 11 (1) (2010) 652–657.
J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171 171

[10] S.Z. Selim, M.A. Ismail, K -means type algorithms: a generalized convergence [24] S. Suresh, N. Sundararajan, P. Saratchandran, A sequential multi-category
theorem and characterization of local optimality, IEEE Transactions on Pattern classifier using radial basis function networks, Neurocomputing 71 (7–9)
Analysis and Machine Intelligence 6 (1984) 81–87. (2008) 1345–1358.
[11] E. Falkenauer, Genetic Algorithms and Grouping Problems, Wiley, Chichester, [25] C.L. Blake, C.J. Merz, University of California at Irvine Repository of Machine
1998. Learning Databases, 1998. http://www.ics.uci.edu/mlearn/MLRepository.
[12] Y. Kao, K. Cheng, An ACO-based clustering algorithm, in: M. Dorigo, et al. (Eds.), html.
ANTS, in: LNCS, vol. 4150, Springer, Berlin, 2006, pp. 340–347. [26] F. Jensen, An Introduction to Bayesian Networks, UCL Press, Springer-Verlag,
[13] R. Younsi, W. Wang, A new artificial immune system algorithm for clustering, 1996.
in: Z.R. Yang (Ed.), LNCS, vol. 3177, Springer, Berlin, 2004, pp. 58–64. [27] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representation by back
[14] IDe Falco, A.D. Cioppa, E. Tarantino, Facing classification problems with propagation errors, Nature 323 (1986) 533–536.
particle swarm optimization, Applied Soft Computing 7 (3) (2007) 652–658. [28] M.H. Hassoun, Fundamentals of Artificial Neural Networks, The MIT Press,
[15] T. Niknam, B. Amiri, J. Olamaei, A. Arefi, An efficient hybrid evolutionary Cambridge, 1995.
optimization algorithm based on PSO and SA for clustering, Journal of Zhejiang [29] J.G. Cleary, L.E. Trigg, K*: an instance-based learner using an entropic distance
University: Science A 10 (4) (2009) 512–519. measure, in: Proceedings of the 12th International Conference on Machine
[16] T. Niknam, E. Taherian Fard, N. Pourjafarian, A.R. Rousta, An efficient hybrid
Learning, 1995, p. 108–114.
algorithm based on modified imperialist competitive algorithm and k-means
[30] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140.
for data clustering, Engineering Applications of Artificial Intelligence 24 (2)
[31] G.I. Webb, Multiboosting: a technique for combining boosting and wagging,
(2011) 306–317.
Machine Learning 40 (2) (2000) 159–196.
[17] T. Niknam, B. Amiri, An efficient hybrid approach based on PSO, ACO and
[32] R. Kohavi, Scaling up the accuracy of Naive–Bayes classifiers: a decision tree
k-means for cluster analysis, Applied Soft Computing 10 (1) (2010) 183–197.
[18] X.S. Yang, Nature-Inspired Metaheuristic Algorithms, Luniver Press, 2008. hybrid, in: Proceedings of the Second International Conference on Knowledge
[19] D. Karaboga, A. Basturk, On the performance of Artificial Bee Colony (ABC) Discovery and Data Mining, AAAI Press, 1996, pp. 202–207.
algorithm, Applied Soft Computing 8 (1) (2008) 687–697. [33] P. Compton, R. Jansen, Knowledge in context: a strategy for expert system
[20] J. Kennady, R.C. Eberhart, Particle swarm optimization, in: IEEE Intl. Conf. on maintenance, in: Proceedings of Artificial Intelligence, in: LNAI, vol. 406,
Neural Networks, vol. 4, 1995, pp. 1942–1948. Springer-Verlag, Berlin, 1988, pp. 292–306.
[21] J. Tyler, Glow-worms. http://website.lineone.net/galaxypix/Tylerbookpt1. [34] G. Demiroz, A. Guvenir, Classification by voting feature intervals, in:
html. Proceedings of the Seventh European Conference on Machine Learning, 1997,
[22] Y. Marinakis, M. Marinaki, M. Doumpos, N. Matsatsinis, C. Zopounidis, A pp. 85–92.
hybrid stochastic genetic-GRASP algorithm for clustering analysis, Operational [35] C. Zhang, D. Ouyang, J. Ning, An artificial bee colony approach for clustering,
Research An International Journal 8 (1) (2008) 33–46. Expert Systems with Applications 37 (7) (2010) 4761–4767.
[23] A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review, ACM Computing [36] T. Yoshida, S. Omatu, Neural network approach to land cover mapping, IEEE
Surveys 31 (3) (1999) 264–323. Transactions on Geoscience and Remote Sensing 32 (5) (1994) 1103–1109.

You might also like