Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Application of Inclined Planes system

Optimization on Data Clustering


Mohammad Hamed Mozaffari, Hamed Abdi, Hamid Zahiri

 clusters is known and was selected by user but unsupervised


Abstract—Data-mining is a branch of science which tends to clustering methods reach to the best number of clusters
extract a series of futures and some meaningful information themself automatically.
from a huge database in proper time and cost. Clustering is one The purpose of the heuristic algorithms is to reach
of the popular methods in this field. The purpose of clustering is
to use a database and group together its items with similar
optimum solution of an optimization problem by inspiration
characteristics. Application of clustering in many fields of of natural and physical phenomena. Heuristic algorithms
science and engineering problems like Pattern recognition, data were introduced in the last decades in various forms. New
retrieval, bio-informatics, machine learning and the Internet clustering methods are aimed at the compilation of past
cause to have significantly developed in the last decades. A methods and heuristic algorithms.
rapid growth in the volume of information in databases revealed
The most popular and famous of these algorithms are
weakness of traditional methods like K-means in facing with
huge data. In this paper a new clustering method based on the Genetic Algorithm (GA) [1], Simulated Annealing (SA) [2],
Inclined Planes system Optimization algorithm was proposed Harmony Search (HS) [3], Artificial Immune System (AIS)
and evaluate on a series of standard datasets. Comparison study [4], Ant Colony Optimization (ACO) [5], and Particle Swarm
revealed a significant superiority over other similar clustering Optimization (PSO) [6].
algorithms. Genetic Algorithm was formed by observing the laws of
natural selection and genetics, based on the Darwin’s theory
Index Terms – Heuristic Algorithms, Inclined Planes system
of evolution. Simulated Annealing was designed by using the
Optimization, Clustering, Data mining.
process of annealing in metallurgy. Harmony Search
algorithm mimics musician’s behaviors in the process of
I. INTRODUCTION improvisation. Artificial Immune System was inspired by the
biological immune system. Ant Colony Optimization was
S olving real life problems by using natural computing
methods which are inspired from nature and physics
simulated from the foraging behavior of real ants when they
searched for food and Particle Swarm Optimization was
phenomena is a novel approach recognized in the last
developed by simulating the social behavior in flock of birds
decades.
when migrating. Stochastic behavior and using randomized
Data clustering is a data-mining method which is used in
phenomena is a usual strategy for these algorithms to
many fields of science for finding similar items in datasets for
simulate natural characteristics, similar to their actual pattern
many purposes like finding a word in the internet database,
while in some other algorithms like Central Force
diagnose of cancer type, gene extraction in bio-information
Optimization (CFO), there is no randomization. CFO is a
and etc.
deterministic heuristic algorithm based on the metaphor of
Purpose of clustering methods is to use a database and
gravitational kinematics [7].
cluster its items to number of groups that each group's
In this study new heuristic algorithm, Inclined Planes
(cluster's) items have similar characteristics with each other
system Optimization was used for solving supervised
(Similarity) and with others groups have lowest similarity
clustering problem [8].
(Dissimilarity). Data features (mostly distance between items)
Performance of the proposed method tested on numbers of
are used for measuring similarity and dissimilarity by using a
standard benchmarks. Comparison study with similar
criterion function. Clustering methods are divided in two
methods, revealed reliability and powerfulness of this method
category, Supervised and Unsupervised clustering. In
for clustering problem.
supervised clustering methods like Kmeans, number of
The rest of this paper is organized in 5 sections. IPO
algorithm is demonstrated in section 2, In Section 3 will
M.H.Mozaffari, H. Abdi and H. Zahiri are with the Department of Electrical offer a brief explain about supervised data clustering while
Engineering, University of Birjand, Birjand, Iran (e-mails:
hamed.mozaffari2@yahoo.com, hamed.abdy@gmail.com and experimental results on data is provided in section 4. Finally
shzahiri@yahoo.com) section 5 concludes the paper.
II. INCLINED PLANES SYSTEM OPTIMIZATION (IPO) xid t  1  k1.rand1.aid t .t 2 
The method of IPO is design based on the dynamic of (5)
k2 .rand2 .vid t .t  xid t 
sliding motion along a frictionless inclined surface. In this
d
algorithm each agent, named “tiny ball” (similar to the
vid t  
xbest t   xid t  (6)
particles in PSO or ants in ACO) searching the problem t
space to find the optimum solution.
In nature and on surface of the Earth, when an object
(here, tiny ball) is elevated, it loses potential energy and
automatically goes to lower elevation levels. These III. PRINCIPLE OF CLUSTERING
phenomena in Physic named as gravitational force which Clustering techniques defines as a problem solving
apply to any objects by the Earth.
method with capacity to divide number of items in a feature
In IPO algorithm, each ball has 3 specifications: position space to number of groups where items in each group would
( x ), height ( h ) and angels (  ) in relation to other balls.
have the most similar characteristic with each other while
Ball's positions are feasible solutions for the problem and items in two different groups would have the least similarity
fitness function is used to calculate the height of each ball. with each other [9]. One dataset with n data which each one
For clear explanation assume a system by N ball as bellow:
is in d dimension, is defined as O  O1 , O2 , , On T where
ximin  xi  ximax , 1  i  N d (1)
Where xi is a decision variable, i is the dimension and

Oi  o1i , oi2 ,  , oid  is i-th data vector and oij is the j-th
feature of i-th data. Thus On d is data matrix in n row and d
N d number of dimension.
column. Clustering methods tend to group dataset items to k
cluster G  C1 , C2 ,  , Ck  where clusters provide the
following conditions:
Ci   , i  1,2, , k (7)
Ci  C j   , i, j  1,2, , k , i  j (8)
k
Fig. 1. An example of search space with 3 balls  Ci  O (9)
i 1
An example of search space in IPO algorithm was shown One of the most important components of clustering
in Fig. 1) With 3 balls. Purpose of IPO is to find minimum of algorithm is a criterion used to find the best cluster centers
fitness function f(x1,x2,…,xNd) which defined on the problem with a proper fitness. For this reason various validity
space. Angle between the i-th ball and j-th one at interval t is functions and criteria are designed. In this paper Mean-
calculated in the following equations where, fi(t) is the value square quantization error (MSE) was used as a fitness
of fitness function (height) for the i-th ball in time t. function for IPO algorithm and defined as equation (10)
  f j t   f i t    where Oi  Z l is distance between data (Oi) and the cluster
ijd t    tan 1  d  ,
 d   center (Zl). In this study, Euclidean distance is used for
 xi t   x j t   
 (2)
 distance measure as equation (11).
for d  1, , n and i, j  1,2, , N , i  j n
The acceleration amount and direction are calculated using J (O, G )   Min Oi  Z l
i 1
 2
, l  1,2,, k  (10)
the equation bellow:
N d
aid t    
 U  f j t   fi t . sin ijd t  (3) L(Oi , O j )  
 Oim  O mj  2
(11)
j 1 m 1
Where U(.) is the unit step function: In the proposed method each ball contains k cluster
1 w  0 centroids in d dimension. Fitness of Balls in every iteration
U w   (4) evaluates by MSE and better balls move toward the better
0 w  0
ones. An example of ball in IPO clustering method is
In each iteration of IPO algorithm, position of each ball is
illustrated bellow:
updated by using equation (5) which k1 and k2 are two Z1 Zk
constant, rand1 and rand2 are two random variable in range    
[0,1] and vid t  is i-th ball velocity in dimension d in time t.
z1 z1  z1  z k z k  z kd
1 2 d 1 2

Where is k-th cluster center which is in d dimension [8, 9].


In equation (6) xbest is the best ball position until current
iteration [8].
IV. EXPERIMENTAL RESULTS exception. So the proposed method provides a powerful
To evaluate the proposed method performance, we have indication for its better speed than other methods.
used and tested on 4 well-known standard benchmarks. To
V. CONCLUSION
compare the data clustering results, four other algorithm
results were used such as Particle Swarm Optimization This study investigated the power of Inclined Planes
(PSO), Gravitational Search Algorithm (GSA) [10], Genetic system Optimization algorithm in counter with supervised
Algorithm (GA) and Kmeans [11]. Results are average of 20 data clustering problem. The proposed results on four
times run of algorithm which was shown in table (1) and standard datasets revealed the powerfulness and reliability of
contain of the best fitness, average fitness and the worst IPO-clustering algorithm. Simulation results show that the
fitness of the algorithms. Standard datasets which are used in proposed method can find the solution in proper time and
the proposed method include: Iris, Wine, CMC and Cancer higher quality than other similar techniques.
[8].
1- Iris: this is perhaps the most famous dataset in literature
and in the field of clustering and was collected by Anderson VI. REFERENCES
(1935). The Iris dataset consists of 150 instances with four [1] E K.S .Tang, K.F.Man, S.Kwong and Q.He, " Genetic algorithms and
numeric features and contains three classes of 50 instances their applications", IEEE Signal Processing Magazine 13 (6), 1996.
[2] S.Kirkpatrick, , C.D.Gelatto and M.P.Vecchi, "Optimization by simulated
each, where each class refers to a type of iris plant. annealing", Science 220 (4598), 671–680. 1983.
2- Wine: There are 178 instances in the Wine dataset, [3] Geem, Z., Kim, J., & Loganathan, G. (2001). A New Heuristic
characterized by 13 numeric features. The features explain in Optimization Algorithm: Harmony Search. Simulation, 60-68.
[4] J.D. Farmer, N.H.Packard and , A.S.Perelson," The immune system,
the chemical analysis of three types of wine. There are three adaptation, and machine learning", Physica D 22, 187–204. 1986.
categories of data: 59 objects in class 1, 71 objects in class 2, [5] M.Dorigo, V.Maniezzo, and A.Colorni, "The Ant System: optimization
and 48 objects in class 3. by a colony of cooperating agents.", IEEE Transaction on systems, Man,
and Cybernetics-part B, vol. 26, no.1, 1996, pp. 1-13.
3- Wisconsin Breast Cancer: There are 683 instances with 9 [6] J.Kennedy and R.C.Eberhart, “Particle swarm optimization”,
numeric features in this dataset. This dataset consists of 444 Proceedings of IEEE International Conference on Neural Networks, vol.
objects in class 1 (malignant) and 239 objects in class 2 4,1995.
[7] R.A. Formato, Central force optimization: a new nature inspired
(benign). computational framework for multidimensional search and optimization,
4- Contraceptive Method Choice (CMC): this dataset consists Studies in Computational Intelligence 129 (2008) 221–238.
of 1473 samples, including 3 classes. Samples characterized [8] Hamed Mozaffari, Hamed Abdi, Seyed-Hamid Zahiri: An inclined planes
system optimization algorithm, Information Science Elsevier Journal
by 9 features. There are 629 instances in class 1, 334 (submitted)
instances in class 2 and 510 instances in class 3. [9] S.H. Zahiri, "Introduction of A novel unsupervised clustering method
Problem setup for IPO-clustering parameters is C1=C2=1, based on Artificial Immune System", Iranian journals of computer and
shift1=shift2=100, scale1=scale2=0.02 and number of iteration electrical engineering, year 6, vol 2, summer 2008
[10] Abdolreza Hatamlou: Application of Gravitational Search Algorithm on
is 500 for the IPO algorithm. K1 and k2 are variable with time Data Clustering, Springer-Verlag Berlin Heidelberg 2011
and was defined as bellow:
c1
k1 t   (12)
1  expt  shift1   scale1 
c2
k2 t   (13)
1  expt  shift2  scale2 

T ABLE 1. RESULTS OF THE IPO-CLUSTERING ALGORITHM AND COMPARISON


WITH OTHER SIMILAR METHODS
Data Result
Kmeans GA PSO GSA IPO
type type
Iris Best 97.33 113.98 96.89 96.69 96.65
Average 106.05 125.19 97.23 96.72 96.65
Worst 120.45 139.77 97.89 96.76 96.67
Wine Best 16555.6 16530.5 16345.9 16315.3 16302.4
Average 18061.0 16530.5 16417.4 16376.6 16301.2
Worst 18563.12 16530.5 16562.3 16425.5 16304.3
CMC Best 5842.2 5705.6 5700.9 5698.1 5693.7
Average 5893.6 5756.5 5820.9 5699.8 5693.7
Worst 5934.4 5812.6 5923.2 5702.0 5693.8
Cancer Best 2999.1 2999.3 2973.5 2967.9 2964.4
Average 3251.2 3249.4 3050.0 2973.5 2964.3
Worst 3521.5 3427.4 3318.8 2990.8 2965

In Table 1, IPO clustering method results from each datasets


shows better results than other methods with no any

You might also like